Header menu link for other important links
X
Cluster labeling for multilingual scatter/gather using comparable corpora
G. Tholpadi, , C. Bhattacharyya, S. Shevade
Published in
2012
Volume: 7224 LNCS
   
Pages: 388 - 400
Abstract
Scatter/Gather systems are increasingly becoming useful in browsing document corpora. Usability of the present-day systems are restricted to monolingual corpora, and their methods for clustering and labeling do not easily extend to the multilingual setting, especially in the absence of dictionaries/machine translation. In this paper, we study the cluster labeling problem for multilingual corpora in the absence of machine translation, but using comparable corpora. Using a variational approach, we show that multilingual topic models can effectively handle the cluster labeling problem, which in turn allows us to design a novel Scatter/Gather system ShoBha. Experimental results on three datasets, namely the Canadian Hansards corpus, the entire overlapping Wikipedia of English, Hindi and Bengali articles, and a trilingual news corpus containing 41,000 articles, confirm the utility of the proposed system. © 2012 Springer-Verlag Berlin Heidelberg.
About the journal
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN03029743