TY - JOUR
T1 - νSAM
T2 - Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints
AU - Pethick, Thomas
AU - Raman, Parameswaran
AU - Minorics, Lenon
AU - Hong, Mingyi
AU - Sabach, Shoham
AU - Cevher, Volkan
N1 - Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, the method comes at the expense of storing a perturbation of the model parameters, which can be restrictive when memory bound. We design a variant of SAM, called νSAM, which obtains a low-rank perturbation by modifying the perturbation constraint. The update almost entirely removes the memory footprint of the perturbation without increasing the computational complexity, thus achieving close to a 1/3 memory saving regarding the parameters when using SGD as the base optimizer. We demonstrate comparable performance of νSAM with SAM on vision transformers both when training models from scratch and for fine-tuning. Interestingly, νSAM seems to significantly improve performance for MLP-Mixer architectures across both settings. The results are corroborated theoretically, where we show that SAM with an arbitrary norm choice (which includes νSAM) can converge even with fixed perturbation radius.
AB - Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, the method comes at the expense of storing a perturbation of the model parameters, which can be restrictive when memory bound. We design a variant of SAM, called νSAM, which obtains a low-rank perturbation by modifying the perturbation constraint. The update almost entirely removes the memory footprint of the perturbation without increasing the computational complexity, thus achieving close to a 1/3 memory saving regarding the parameters when using SGD as the base optimizer. We demonstrate comparable performance of νSAM with SAM on vision transformers both when training models from scratch and for fine-tuning. Interestingly, νSAM seems to significantly improve performance for MLP-Mixer architectures across both settings. The results are corroborated theoretically, where we show that SAM with an arbitrary norm choice (which includes νSAM) can converge even with fixed perturbation radius.
UR - http://www.scopus.com/inward/record.url?scp=85218603485&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218603485&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85218603485
SN - 2835-8856
VL - 2025
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -