Approximating snowflake metrics by trees

Research output: Contribution to journalArticle

Abstract

Tree metrics are encountered throughout pure and applied mathematics. Their simple structure makes them a convenient choice of metric in many applications from machine learning and computer science. At the same time, there is an elegant theory of harmonic analysis with respect to tree metrics that parallels the classical theory. A basic question in this field, which is of both theoretical and practical interest, is how to design efficient algorithms for building trees with good metric properties. In particular, given a finite metric space, we seek a random family of dominating tree metrics approximating the underlying metric in expectation. For general metrics, this problem has been solved: on the one hand, there are finite metric spaces that cannot be approximated by trees without incurring a distortion logarithmic in the size of the space, while the tree construction of Fakcharoenphol, Rao, and Talwar (FRT, 2003) shows how to achieve such a logarithmic error for arbitrary metrics. Since a distortion that grows even logarithmically with the size of the set may be too large for practical use in many settings, one naturally asks if there is a more restricted class of metrics where one can do better. The main result of this paper is that certain random family of trees already studied in the computer science literature, including the FRT trees, can be used to approximate snowflake metrics (metrics raised to a power less than 1) with expected distortion bounded by its doubling dimension and the degree of snowflaking. We also show that without snowflaking, the metric distortion can be bounded by a term logarithmic in the distance being approximated and linear in the dimension. We also present an optimal algorithm for building a single FRT tree, whose running time is bounded independently of all problem parameters other than the number of points. We conclude by demonstrating our theoretical results on a numerical example, and applying them to the approximation of the Earth Mover's Distance between probability distributions.

Original languageEnglish (US)
Pages (from-to)405-424
Number of pages20
JournalApplied and Computational Harmonic Analysis
Volume45
Issue number2
DOIs
StatePublished - Sep 2018
Externally publishedYes

Keywords

  • Dimension
  • EMD
  • Earth Mover's Distance
  • Partition trees
  • Snowflake metric
  • Spaces of homogeneous type
  • Tree approximation
  • Tree metric

Fingerprint Dive into the research topics of 'Approximating snowflake metrics by trees'. Together they form a unique fingerprint.

  • Cite this