Skip to main navigation Skip to search Skip to main content

OmniMatch: Joinability Discovery in Data Products

  • Christos Koutras
  • , Jiani Zhang
  • , Xiao Qin
  • , Chuan Lei
  • , Vasileios Ioannidis
  • , Christos Faloutsos
  • , George Karypis
  • , Asterios Katsifodimos

Research output: Contribution to journalConference articlepeer-review

Abstract

We propose OmniMatch, a novel joinability discovery technique, specifically tailored for the needs of data products: cohesive curated collections of tabular datasets. OmniMatch combines multiple column-pair similarity measures leveraging self-supervised Graph Neural Networks (GNNs). OmniMatch’s GNN captures column relatedness by leveraging graph neighborhood information, significantly improving the recall of joinability discovery tasks. At the same time, OmniMatch increases its precision by augmenting its training data with negative column join examples through an automated negative example generation process. Compared to the state-of-the-art, OmniMatch exhibits up to 14% higher effectiveness in F1 score and AUC without relying on individual, user-provided thresholds for each similarity metric.

Original languageEnglish (US)
Pages (from-to)4588-4601
Number of pages14
JournalProceedings of the VLDB Endowment
Volume18
Issue number11
DOIs
StatePublished - 2025
Externally publishedYes
Event51st International Conference on Very Large Data Bases, VLDB 2025 - London, United Kingdom
Duration: Sep 1 2025Sep 5 2025

Bibliographical note

Publisher Copyright:
© 2025, VLDB Endowment. All rights reserved.

Fingerprint

Dive into the research topics of 'OmniMatch: Joinability Discovery in Data Products'. Together they form a unique fingerprint.

Cite this