Abstract
Large language models (LLMs) have demonstrated remarkable capabilities, yet they occasionally exhibit sycophantic behavior, generating responses that align with or agree with a user's stated opinions or preferences, even when those opinions are incorrect or biased. This sycophantic tendency can undermine the trustworthiness and reliability of LLMs. This work proposes a novel approach to mitigate sycophancy in LLMs by fine-tuning them on a carefully curated dataset comprising prompts paired with sycophantic and non-sycophantic responses 1. Our method leverages Direct Preference Optimization (DPO), which optimizes LLMs to generate responses that align with the preferred (non-sycophantic) outputs without requiring explicit reward modeling. We develop a dataset of 1000 prompts with sycophantic and non-sycophantic responses to fine-tune LLMs. Our approach achieves an average reduction of 85% in persona-based tests and 84% in preference-driven tests, demonstrating significant mitigation of sycophantic behaviors. Our findings pave the way for more trustworthy and reliable language models that can provide objective and unbiased responses, aligning with human preferences while maintaining factual accuracy.
| Original language | English (US) |
|---|---|
| Title of host publication | Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024 |
| Editors | Wei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 1664-1671 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798350362480 |
| DOIs | |
| State | Published - 2024 |
| Event | 2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States Duration: Dec 15 2024 → Dec 18 2024 |
Publication series
| Name | Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024 |
|---|
Conference
| Conference | 2024 IEEE International Conference on Big Data, BigData 2024 |
|---|---|
| Country/Territory | United States |
| City | Washington |
| Period | 12/15/24 → 12/18/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.
Keywords
- Finetuning
- Large Language Models