EMC2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

Chung Yiu Yau, Hoi To Wai, Parameswaran Raman, Soumajyoti Sarkar, Mingyi Hong

Research output: Contribution to journalConference articlepeer-review

Abstract

A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC2). We follow the global contrastive learning loss as introduced in (Yuan et al., 2022), and propose EMC2 which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC2 finds an O(1/T)-stationary point of the global contrastive loss in T iterations. Compared to prior works, EMC2 is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC2 is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.

Original languageEnglish (US)
Pages (from-to)56966-56981
Number of pages16
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: Jul 21 2024Jul 27 2024

Bibliographical note

Publisher Copyright:
Copyright 2024 by the author(s)

Fingerprint

Dive into the research topics of 'EMC2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence'. Together they form a unique fingerprint.

Cite this