Abstract
We propose OmniMatch, a novel joinability discovery technique, specifically tailored for the needs of data products: cohesive curated collections of tabular datasets. OmniMatch combines multiple column-pair similarity measures leveraging self-supervised Graph Neural Networks (GNNs). OmniMatch’s GNN captures column relatedness by leveraging graph neighborhood information, significantly improving the recall of joinability discovery tasks. At the same time, OmniMatch increases its precision by augmenting its training data with negative column join examples through an automated negative example generation process. Compared to the state-of-the-art, OmniMatch exhibits up to 14% higher effectiveness in F1 score and AUC without relying on individual, user-provided thresholds for each similarity metric.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 4588-4601 |
| Number of pages | 14 |
| Journal | Proceedings of the VLDB Endowment |
| Volume | 18 |
| Issue number | 11 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
| Event | 51st International Conference on Very Large Data Bases, VLDB 2025 - London, United Kingdom Duration: Sep 1 2025 → Sep 5 2025 |
Bibliographical note
Publisher Copyright:© 2025, VLDB Endowment. All rights reserved.
Fingerprint
Dive into the research topics of 'OmniMatch: Joinability Discovery in Data Products'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS