Wireless and wireline networks, such as Internet, cellular, and content delivery networks are to serve end-user file requests proactively. To this aim, by storing anticipated highly popular files during off-peak periods, and fetching them to end-users during on-peak instances, these networks smoothen out the load fluctuations on the back-haul links. In this context, several practical networks comprise a parent caching node connected to multiple leaf nodes to serve end-user file requests. To model the two-way interactive influence between caching decisions at the parent and leaf nodes, a reinforcement learning formulation is put forth in this work. Furthermore, to endow with scalability so that the algorithm can effectively handle the curse of dimensionality, a deep reinforcement learning approach is also developed. Our novel caching policy relies on a deep Q-network to enforce the parent node with ability to learn-and-adapt to unknown policies of leaf nodes as well as spatio-temporal dynamic evolution of file requests, results in remarkable caching performance, as corroborated through numerical tests.