TY - GEN
T1 - Randomized algorithms for comparison-based search
AU - Tschopp, Dominique
AU - Diggavi, Suhas
AU - Delgosha, Payam
AU - Mohajer, Soheil
PY - 2011
Y1 - 2011
N2 - This paper addresses the problem of finding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which defines approximate triangle inequalities on ranks. We present a lower bound of Ω(Dlog n/D + D2) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop a randomized scheme for NN retrieval in O(D3 log2 n + Dlog2 n log log nD3 ) questions. The learning requires asking O(nD3 log2 n + Dlog2 n log log nD3 ) questions and O(n log2 n/ log(2D)) bits to store.
AB - This paper addresses the problem of finding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which defines approximate triangle inequalities on ranks. We present a lower bound of Ω(Dlog n/D + D2) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop a randomized scheme for NN retrieval in O(D3 log2 n + Dlog2 n log log nD3 ) questions. The learning requires asking O(nD3 log2 n + Dlog2 n log log nD3 ) questions and O(n log2 n/ log(2D)) bits to store.
UR - http://www.scopus.com/inward/record.url?scp=85162393524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85162393524&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85162393524
SN - 9781618395993
T3 - Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
BT - Advances in Neural Information Processing Systems 24
PB - Neural Information Processing Systems
T2 - 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
Y2 - 12 December 2011 through 14 December 2011
ER -