Fast approximate kNN graph construction for high dimensional data via recursive lanczos bisection

Jie Chen, Haw Ren Fang, Yousef Saad

Research output: Contribution to journalArticlepeer-review

188 Scopus citations


Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes ⊖(dn2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in ⊖(dnt) time for high dimensional data (large d). The exponent t ε (1,2) is an increasing function of an internal parameter a which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of t. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs.

Original languageEnglish (US)
Pages (from-to)1989-2012
Number of pages24
JournalJournal of Machine Learning Research
StatePublished - Nov 18 2009


  • Divide and conquer
  • High dimensional data
  • Lanczos algorithm
  • Nearest neighbors graph
  • Spectral method


Dive into the research topics of 'Fast approximate kNN graph construction for high dimensional data via recursive lanczos bisection'. Together they form a unique fingerprint.

Cite this