Cluster labeling for multilingual scatter/gather using comparable corpora

G. Tholpadi; Mrinal Kanti Das; C. Bhattacharyya; S. Shevade

doi:10.1007/978-3-642-28997-2_33

Profiles Research Units Publications

Conferences

Cluster labeling for multilingual scatter/gather using comparable corpora

G. Tholpadi, , C. Bhattacharyya, S. Shevade

Published in

2012

DOI: 10.1007/978-3-642-28997-2_33

Volume: 7224 LNCS

Pages: 388 - 400

Abstract

Scatter/Gather systems are increasingly becoming useful in browsing document corpora. Usability of the present-day systems are restricted to monolingual corpora, and their methods for clustering and labeling do not easily extend to the multilingual setting, especially in the absence of dictionaries/machine translation. In this paper, we study the cluster labeling problem for multilingual corpora in the absence of machine translation, but using comparable corpora. Using a variational approach, we show that multilingual topic models can effectively handle the cluster labeling problem, which in turn allows us to design a novel Scatter/Gather system ShoBha. Experimental results on three datasets, namely the Canadian Hansards corpus, the entire overlapping Wikipedia of English, Hindi and Bengali articles, and a trilingual news corpus containing 41,000 articles, confirm the utility of the proposed system. © 2012 Springer-Verlag Berlin Heidelberg.

Topics: Cluster labeling (62)%, Machine translation (57)%, Cluster analysis (51)% and Topic model (50)%

View more info for "Cluster labeling for multilingual scatter/gather using comparable corpora"

About the journal

Journal	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN	03029743

Authors (1)

Mrinal Kanti Das
- Department of Data Science

About IIT Palakkad

Research & Development

Academics

Quick Find

About IIT Palakkad

Research & Development

Academics

Quick Find