Header menu link for other important links
Weight-agnostic hierarchical stick-breaking process
, C. Bhattacharyya
Published in Institute of Electrical and Electronics Engineers Inc.
Pages: 342 - 349
Learning from multiple groups of observations are often useful due to the advantage of sharing of statistical information. Hierarchical Bayesian models provide a natural mechanism to achieve this, and hierarchical Dirichlet processes (HDPs) have shown significant impact in this field. HDPs define a collection of probability measures one for each group. All the measures provide support on a common countably infinite set of atoms to share information. The fundamental mechanism in all the variants of HDP make the weights on these atoms positively correlated across groups. This structural limitation is impossible to resolve without changing the sharing principle. But this property hinders the applicability of HDP priors to many problems, when an atom may be highly probable in some groups despite being rare in all other groups. This becomes evident in clustering through association of atoms and observations. Some clusters may be weakly present in most of the groups in spite of being prominent in some groups and vice-versa. In this paper, we pose the problem of weight agnosticism, that of constructing a collection of probability measures on a common countably infinite set of atoms with mutually independent weights across groups. This implies that, a cluster can contain observations from all groups, but popularities of a cluster across groups are mutually independent. So the size of a cluster in a group does not interfere in the participation of observations in other groups to that cluster. Our contribution is also to construct a novel hierarchical Bayesian nonparametric prior, Weight-Agnostic hierarchical Stick-breaking process (WAS), which models weight agnosticism. WAS extends the framework of stick-breaking process (SBP) in a novel direction. However, WAS becomes non-exchangeable and that makes inference process non-standard. But, We derive tractable predictive probability functions for WAS, which is useful in deriving efficient truncation-free MCMC inference competitive with those in HDP settings. We discuss few real life applications of WAS in topic moeling and information retrieval. Furthermore, experimenting with five real life datasets we show that, WAS significantly outperforms HDP in various settings. © 2018 IEEE