νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints

Thomas Pethick, Parameswaran Raman, Lenon Minorics, Mingyi Hong, Shoham Sabach, Volkan Cevher

Research output: Contribution to journalArticlepeer-review

Abstract

Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, the method comes at the expense of storing a perturbation of the model parameters, which can be restrictive when memory bound. We design a variant of SAM, called νSAM, which obtains a low-rank perturbation by modifying the perturbation constraint. The update almost entirely removes the memory footprint of the perturbation without increasing the computational complexity, thus achieving close to a 1/3 memory saving regarding the parameters when using SGD as the base optimizer. We demonstrate comparable performance of νSAM with SAM on vision transformers both when training models from scratch and for fine-tuning. Interestingly, νSAM seems to significantly improve performance for MLP-Mixer architectures across both settings. The results are corroborated theoretically, where we show that SAM with an arbitrary norm choice (which includes νSAM) can converge even with fixed perturbation radius.

Original languageEnglish (US)
JournalTransactions on Machine Learning Research
Volume2025
StatePublished - 2025

Bibliographical note

Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.

Fingerprint

Dive into the research topics of 'νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints'. Together they form a unique fingerprint.

Cite this