Header menu link for other important links
X
Learning interesting attributes for automated data categorization∗
, Sebastian Michel
Published in Association for Computing Machinery
2018
Abstract
This work proposes and evaluates a novel approach to determining interesting attributes, in order to categorize entities accordingly. Once identified, such categories are of immense value to allow constraining (filtering) a user's current view to subsets of entities. We show how a classifier is trained that is able to tell whether or not a categorical attribute can act as a constraint, in the sense of human-perceived interestingness. The training data is harnessed from Wikipedia tables, treating the presence or absence of a table as an indication that the attribute used as a filter constraint is reasonable or not. For learning the classification model, we review four well-known statistical measures (features) for categorical attributes—entropy, unalikeability, peculiarity and coverage. We additionally propose three new statistical measures to capture the distribution of data, tailored to our main objective. The learned model is evaluated by relevance assessments obtained through a user study, reflecting the applicability of the approach as a whole and, further, demonstrates the superiority of the proposed diversity measures over existing measures like information entropy.
About the journal
JournalData powered by TypesetACM International Conference Proceeding Series
PublisherData powered by TypesetAssociation for Computing Machinery
Open AccessNo