Skip to main navigation Skip to search Skip to main content

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit

  • Huixue Zhou
  • , Hengrui Gu
  • , Zaifu Zhan
  • , Xi Liu
  • , Kaixiong Zhou
  • , Mingfu Liang
  • , Yongkang Xiao
  • , Srinivas Govindan
  • , Piyush Chawla
  • , Jiyan Yang
  • , Xiangfei Meng
  • , Huayu Li
  • , Buyun Zhang
  • , Liang Luo
  • , Wen Yen Chen
  • , Yiping Han
  • , Bo Long
  • , Rui Zhang
  • , Tianlong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The deployment of Large Language Models (LLMs) in recommender systems for Click-Through Rate (CTR) prediction requires a careful balance between computational efficiency and predictive accuracy. This paper introduces OptiRAG-Rec, a comprehensive framework that integrates Retrieval-Augmented Generation (RAG) with a novel multi-head early exit architecture to address both challenges. By leveraging Graph Convolutional Networks (GCNs) as efficient retrieval mechanisms, the framework significantly reduces data retrieval times while maintaining high model performance. Additionally, the multi-head early exit strategy dynamically terminates inference based on real-time predictive confidence assessments, enhancing responsiveness without sacrificing accuracy. Experimental results demonstrate that OptiRAG-Rec reduces computation time while preserving the precision required for reliable recommendations, establishing a new benchmark for efficient and accurate LLM deployment in recommendation.

Original languageEnglish (US)
Title of host publicationLong Papers
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages26443-26458
Number of pages16
ISBN (Electronic)9798891762510
DOIs
StatePublished - 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: Jul 27 2025Aug 1 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period7/27/258/1/25

Bibliographical note

Publisher Copyright:
© 2025 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit'. Together they form a unique fingerprint.

Cite this