Abstract
Given a sequence of random variables (rewards), the Haviv-Puterman differential equation relates the expected infinite-horizon λ-discounted reward and the expected total reward up to a random time that is determined by an independent negative binomial random variable with parameters 2 and λ. This paper provides an interpretation of this proven, but previously unexplained, result. Furthermore, the interpretation is formalized into a new proof, which then yields new results for the general case where the rewards are accumulated up to a time determined by an independent negative binomial random variable with parameters k and λ.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 589-599 |
| Number of pages | 11 |
| Journal | Journal of Applied Probability |
| Volume | 35 |
| Issue number | 3 |
| DOIs | |
| State | Published - Sep 1998 |
Bibliographical note
Copyright:Copyright 2017 Elsevier B.V., All rights reserved.
Keywords
- Markov decision processes
- Reward processes
- Sums of random variables
Fingerprint
Dive into the research topics of 'Negative binomial sums of random variables and discounted reward processes'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS