Abstract
Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, applying ZO fine-tuning in memory-constrained settings such as mobile phones and laptops remains challenging since these settings often involve weight quantization, while ZO requires full-precision perturbation and update. In this study, we address this limitation by combining static sparse ZO fine-tuning with quantization. Our approach transfers a small, static subset (0.1%) of "sensitive" parameters from pre-training to downstream tasks, focusing fine-tuning on this sparse set of parameters. The remaining untuned parameters are quantized, reducing memory demands. Our proposed workflow enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8GB of memory while outperforming full model ZO fine-tuning performance and in-context learning. We provide an open-source implementation at https://github.com/GarlGuo/SensZOQ.
| Original language | English (US) |
|---|---|
| Title of host publication | 13th International Conference on Learning Representations, ICLR 2025 |
| Publisher | International Conference on Learning Representations, ICLR |
| Pages | 59924-59964 |
| Number of pages | 41 |
| ISBN (Electronic) | 9798331320850 |
| State | Published - 2025 |
| Event | 13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore Duration: Apr 24 2025 → Apr 28 2025 |
Publication series
| Name | 13th International Conference on Learning Representations, ICLR 2025 |
|---|
Conference
| Conference | 13th International Conference on Learning Representations, ICLR 2025 |
|---|---|
| Country/Territory | Singapore |
| City | Singapore |
| Period | 4/24/25 → 4/28/25 |
Bibliographical note
Publisher Copyright:© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.