Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

Alireza Sadeghi, Fatemeh Sheikholeslami, Georgios B. Giannakis

Research output: Contribution to journalArticle

46 Scopus citations

Abstract

Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus, enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

Original languageEnglish (US)
Pages (from-to)180-190
Number of pages11
JournalIEEE Journal on Selected Topics in Signal Processing
Volume12
Issue number1
DOIs
StatePublished - Feb 2018

    Fingerprint

Keywords

  • Caching
  • Markov decision process (MDP)
  • Q-learning
  • dynamic popularity profile
  • reinforcement learning

Cite this