`a`
Journal of Dynamics and Games (JDG)
 

A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces

Pages: 261 - 278, Volume 3, Issue 3, July 2016      doi:10.3934/jdg.2016014

 
       Abstract        References        Full Text (464.1K)       Related Articles       

Óscar Vega-Amaya - Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Col. Centro, 83000, Hermosillo, Sonora, Mexico (email)
Joaquín López-Borbón - Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Col. Centro, 83000, Hermosillo, Sonora, Mexico (email)

Abstract: The present paper gives computable performance bounds for the approximate value iteration (AVI) algorithm when are used approximation operators satisfying the following properties: (i) they are positive linear operators; (ii) constant functions are fixed points of such operators; (iii) they have certain continuity property. Such operators define transition probabilities on the state space of the controlled systems. This has two important consequences: (a) one can see the approximating function as the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be thought of as a perturbation of the original Markov model. These two facts enable us to give finite-time bounds for the AVI algorithm performance depending on the operators accuracy to approximate the cost function and the transition law of the system. The results are illustrated with numerical approximations for a class of inventory systems.

Keywords:  Markov decision processes, discounted criterion, approximate value iteration algorithm, perturbed models.
Mathematics Subject Classification:  Primary: 93E20, 90C59; Secondary: 90C40.

Received: December 2015;      Revised: July 2016;      Available Online: August 2016.

 References