Learning interesting attributes for automated data categorization∗

Koninika Pal; Sebastian Michel

doi:10.1145/3221269.3223035

Profiles Research Units Publications

Articles

Learning interesting attributes for automated data categorization∗

, Sebastian Michel

Published in Association for Computing Machinery

2018

DOI: 10.1145/3221269.3223035

Abstract

This work proposes and evaluates a novel approach to determining interesting attributes, in order to categorize entities accordingly. Once identified, such categories are of immense value to allow constraining (filtering) a user's current view to subsets of entities. We show how a classifier is trained that is able to tell whether or not a categorical attribute can act as a constraint, in the sense of human-perceived interestingness. The training data is harnessed from Wikipedia tables, treating the presence or absence of a table as an indication that the attribute used as a filter constraint is reasonable or not. For learning the classification model, we review four well-known statistical measures (features) for categorical attributes—entropy, unalikeability, peculiarity and coverage. We additionally propose three new statistical measures to capture the distribution of data, tailored to our main objective. The learned model is evaluated by relevance assessments obtained through a user study, reflecting the applicability of the approach as a whole and, further, demonstrates the superiority of the proposed diversity measures over existing measures like information entropy.

Topics: Categorical variable (56)%, Classifier (UML) (52)% and Categorization (51)%

View more info for "Learning interesting attributes for automated data categorization"

About the journal

Journal	Data powered by SciSpaceACM International Conference Proceeding Series
Publisher	Data powered by SciSpaceAssociation for Computing Machinery
Open Access	No

Authors (1)

Koninika Pal
- Department of Data Science

About IIT Palakkad

Research & Development

Academics

Quick Find

About IIT Palakkad

Research & Development

Academics

Quick Find