Header menu link for other important links
X
Efficient similarity search across Top-k Lists under the Kendall's Tau distance
, Sebastian Michel
Published in Association for Computing Machinery
2016
Volume: 18-20-July-2016
   
Abstract
We consider the problem of similarity search in a set of topk lists under the generalized Kendall's Tau distance. This distance describes how related two rankings are in terms of discordantly ordered items. We consider pair and tripletsbased indices to counter the shortcomings of naive inverted indices and derive efficient query schemes by relating the proposed index structures to the concept of locality sensitive hashing (LSH). Specifically, we devise four different LSH schemes for Kendall's Tau using two generic hash families over individual elements or pairs of them. We show that each of these functions has the desired property of being locality sensitive. Further, we discuss the selection of hash functions for the proposed LSH schemes for a given query ranking, called query-driven LSH and derive bounds for the required number of hash functions to use in order to achieve a predefined recall goal. Experimental results, using two realworld datasets, show that the devised methods outperform the SimJoin method the state of the art method to query for similar sets and are far superior to a plain inverted index-based approach.
About the journal
JournalData powered by TypesetACM International Conference Proceeding Series
PublisherData powered by TypesetAssociation for Computing Machinery
Open AccessNo