Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard.Yet, as LLMs grow in size, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge.Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount.This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by Malladi et al.(2023).Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes.Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity.Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning.Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM.

Original languageEnglish (US)
Pages (from-to)59173-59190
Number of pages18
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: Jul 21 2024Jul 27 2024

Bibliographical note

Publisher Copyright:
Copyright 2024 by the author(s)

Fingerprint

Dive into the research topics of 'Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark'. Together they form a unique fingerprint.

Cite this