2014, 1(1): 105-119. doi: 10.3934/jdg.2014.1.105

Average optimal strategies for zero-sum Markov games with poorly known payoff function on one side

1. 

Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Centro, C.P. 83000, Hermosillo, Sonora, Mexico

2. 

Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, Centro, C.P. 83000, Hermosillo, Sonora,, Mexico

Received  January 2012 Revised  June 2012 Published  June 2013

We are concerned with two-person zero-sum Markov games with Borel spaces under a long-run average criterion. The payoff function is possibly unbounded and depends on a parameter which is unknown to one of the players. The parameter and the payoff function can be estimated by implementing statistical methods. Thus, our main objective is to combine such estimation procedure with a variant of the so-called vanishing discount approach to construct an average optimal pair of strategies for the game. Our results are applied to a class of zero-sum semi-Markov games.
Citation: Fernando Luque-Vásquez, J. Adolfo Minjárez-Sosa. Average optimal strategies for zero-sum Markov games with poorly known payoff function on one side. Journal of Dynamics & Games, 2014, 1 (1) : 105-119. doi: 10.3934/jdg.2014.1.105
References:
[1]

H. S. Chang, Perfect information two-person zero-sum Markov games with imprecise transition probabilities,, Math. Meth. Oper. Res., 64 (2006), 335. doi: 10.1007/s00186-006-0081-5.

[2]

J. I. González-Trejo, O. Hernández-Lerma and L. F. Hoyos-Reyes, Minimax control of discrete-time stochastic systems,, SIAM J. Control Optim., 41 (2003), 1626. doi: 10.1137/S0363012901383837.

[3]

E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion,, Kybernetika (Prague), 34 (1998), 217.

[4]

M. K. Ghosh, D. McDonald and S. Sinha, Zero-sum stochastic games with partial information,, J. Optimiz. Theory Appl., 121 (2004), 99. doi: 10.1023/B:JOTA.0000026133.56615.cf.

[5]

O. Hernández-Lerma and J. B. Lasserre, "Discrete-Time Markov Control Processes. Basic Optimality Criteria,", Applications of Mathematics (New York), 30 (1996).

[6]

O. Hernández-Lerma and J. B. Lasserre, "Further Topics on Discrete-Time Markov Control Processes,", Applications of Mathematics (New York), 42 (1999).

[7]

O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: Average payoff criteria,, SIAM J. Control Optim., 39 (2001), 1520. doi: 10.1137/S0363012999361962.

[8]

A. Jaśkiewicz and A. Nowak, Zero-sum ergodic stochastic games with Feller transition probabilities,, SIAM J. Control Optim., 45 (2006), 773. doi: 10.1137/S0363012904443257.

[9]

A. Krausz and U. Rieder, Markov games with incomplete information,, Math. Meth. Oper. Res., 46 (1997), 263. doi: 10.1007/BF01217695.

[10]

H.-U. Küenle, On Markov games with average reward criterion and weakly continuous transition probabilities,, SIAM J. Control Optim., 45 (2007), 2156. doi: 10.1137/040617303.

[11]

E. L. Lehmann and G. Casella, "Theory of Point Estimation,", Second edition, (1998).

[12]

F. Luque-Vásquez, Zero-sum semi-Markov games in Borel spaces: Discounted and average payoff,, Bol. Soc. Mat. Mexicana (3), 8 (2002), 227.

[13]

J. A. Minjárez-Sosa and F. Luque-Vásquez, Two person zero-sum semi-Markov games with unknown holding times distribution in one side: A discounted payoff criterion,, Appl. Math. Optim., 57 (2008), 289. doi: 10.1007/s00245-007-9016-7.

[14]

J. A. Minjárez-Sosa and O. Vega-Amaya, Asymptotically optimal strategies for adaptive zero-sum discounted Markov games,, SIAM J. Control Optim., 48 (2009), 1405. doi: 10.1137/060651458.

[15]

K. Najim, A. S. Poznyak and E. Gómez, Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs,, Automatica J. IFAC, 37 (2001), 1007. doi: 10.1016/S0005-1098(01)00050-4.

[16]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. I. Certainty equivalence strategies,, Math. Oper. Res., 20 (1995), 743. doi: 10.1287/moor.20.3.743.

[17]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. II. Asymptotic optimality,, Math. Oper. Res., 21 (1996), 487. doi: 10.1287/moor.21.2.487.

[18]

J. A. E. E. Van Nunen and J. Wessels, A note on dynamic programming with unbounded rewards,, Manag. Sci., 24 (1978), 576.

show all references

References:
[1]

H. S. Chang, Perfect information two-person zero-sum Markov games with imprecise transition probabilities,, Math. Meth. Oper. Res., 64 (2006), 335. doi: 10.1007/s00186-006-0081-5.

[2]

J. I. González-Trejo, O. Hernández-Lerma and L. F. Hoyos-Reyes, Minimax control of discrete-time stochastic systems,, SIAM J. Control Optim., 41 (2003), 1626. doi: 10.1137/S0363012901383837.

[3]

E. I. Gordienko and J. A. Minjárez-Sosa, Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion,, Kybernetika (Prague), 34 (1998), 217.

[4]

M. K. Ghosh, D. McDonald and S. Sinha, Zero-sum stochastic games with partial information,, J. Optimiz. Theory Appl., 121 (2004), 99. doi: 10.1023/B:JOTA.0000026133.56615.cf.

[5]

O. Hernández-Lerma and J. B. Lasserre, "Discrete-Time Markov Control Processes. Basic Optimality Criteria,", Applications of Mathematics (New York), 30 (1996).

[6]

O. Hernández-Lerma and J. B. Lasserre, "Further Topics on Discrete-Time Markov Control Processes,", Applications of Mathematics (New York), 42 (1999).

[7]

O. Hernández-Lerma and J. B. Lasserre, Zero-sum stochastic games in Borel spaces: Average payoff criteria,, SIAM J. Control Optim., 39 (2001), 1520. doi: 10.1137/S0363012999361962.

[8]

A. Jaśkiewicz and A. Nowak, Zero-sum ergodic stochastic games with Feller transition probabilities,, SIAM J. Control Optim., 45 (2006), 773. doi: 10.1137/S0363012904443257.

[9]

A. Krausz and U. Rieder, Markov games with incomplete information,, Math. Meth. Oper. Res., 46 (1997), 263. doi: 10.1007/BF01217695.

[10]

H.-U. Küenle, On Markov games with average reward criterion and weakly continuous transition probabilities,, SIAM J. Control Optim., 45 (2007), 2156. doi: 10.1137/040617303.

[11]

E. L. Lehmann and G. Casella, "Theory of Point Estimation,", Second edition, (1998).

[12]

F. Luque-Vásquez, Zero-sum semi-Markov games in Borel spaces: Discounted and average payoff,, Bol. Soc. Mat. Mexicana (3), 8 (2002), 227.

[13]

J. A. Minjárez-Sosa and F. Luque-Vásquez, Two person zero-sum semi-Markov games with unknown holding times distribution in one side: A discounted payoff criterion,, Appl. Math. Optim., 57 (2008), 289. doi: 10.1007/s00245-007-9016-7.

[14]

J. A. Minjárez-Sosa and O. Vega-Amaya, Asymptotically optimal strategies for adaptive zero-sum discounted Markov games,, SIAM J. Control Optim., 48 (2009), 1405. doi: 10.1137/060651458.

[15]

K. Najim, A. S. Poznyak and E. Gómez, Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs,, Automatica J. IFAC, 37 (2001), 1007. doi: 10.1016/S0005-1098(01)00050-4.

[16]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. I. Certainty equivalence strategies,, Math. Oper. Res., 20 (1995), 743. doi: 10.1287/moor.20.3.743.

[17]

N. Shimkin and A. Shwartz, Asymptotically efficient adaptive strategies in repeated games. II. Asymptotic optimality,, Math. Oper. Res., 21 (1996), 487. doi: 10.1287/moor.21.2.487.

[18]

J. A. E. E. Van Nunen and J. Wessels, A note on dynamic programming with unbounded rewards,, Manag. Sci., 24 (1978), 576.

[1]

Xiangxiang Huang, Xianping Guo, Jianping Peng. A probability criterion for zero-sum stochastic games. Journal of Dynamics & Games, 2017, 4 (4) : 369-383. doi: 10.3934/jdg.2017020

[2]

Marianne Akian, Stéphane Gaubert, Antoine Hochart. Ergodicity conditions for zero-sum games. Discrete & Continuous Dynamical Systems - A, 2015, 35 (9) : 3901-3931. doi: 10.3934/dcds.2015.35.3901

[3]

Sylvain Sorin, Guillaume Vigeral. Reversibility and oscillations in zero-sum discounted stochastic games. Journal of Dynamics & Games, 2015, 2 (1) : 103-115. doi: 10.3934/jdg.2015.2.103

[4]

Alexander J. Zaslavski. Structure of approximate solutions of dynamic continuous time zero-sum games. Journal of Dynamics & Games, 2014, 1 (1) : 153-179. doi: 10.3934/jdg.2014.1.153

[5]

Beatris A. Escobedo-Trujillo. Discount-sensitive equilibria in zero-sum stochastic differential games. Journal of Dynamics & Games, 2016, 3 (1) : 25-50. doi: 10.3934/jdg.2016002

[6]

Qingmeng Wei, Zhiyong Yu. Time-inconsistent recursive zero-sum stochastic differential games. Mathematical Control & Related Fields, 2018, 8 (3&4) : 1051-1079. doi: 10.3934/mcrf.2018045

[7]

Lasse Kliemann, Elmira Shirazi Sheykhdarabadi, Anand Srivastav. Price of anarchy for graph coloring games with concave payoff. Journal of Dynamics & Games, 2017, 4 (1) : 41-58. doi: 10.3934/jdg.2017003

[8]

Qiuli Liu, Xiaolong Zou. A risk minimization problem for finite horizon semi-Markov decision processes with loss rates. Journal of Dynamics & Games, 2018, 5 (2) : 143-163. doi: 10.3934/jdg.2018009

[9]

Alexander J. Zaslavski. Turnpike properties of approximate solutions of dynamic discrete time zero-sum games. Journal of Dynamics & Games, 2014, 1 (2) : 299-330. doi: 10.3934/jdg.2014.1.299

[10]

Libin Mou, Jiongmin Yong. Two-person zero-sum linear quadratic stochastic differential games by a Hilbert space method. Journal of Industrial & Management Optimization, 2006, 2 (1) : 95-117. doi: 10.3934/jimo.2006.2.95

[11]

Fabien Gensbittel, Miquel Oliu-Barton, Xavier Venel. Existence of the uniform value in zero-sum repeated games with a more informed controller. Journal of Dynamics & Games, 2014, 1 (3) : 411-445. doi: 10.3934/jdg.2014.1.411

[12]

Georg Ostrovski, Sebastian van Strien. Payoff performance of fictitious play. Journal of Dynamics & Games, 2014, 1 (4) : 621-638. doi: 10.3934/jdg.2014.1.621

[13]

Zhi-Wei Sun. Unification of zero-sum problems, subset sums and covers of Z. Electronic Research Announcements, 2003, 9: 51-60.

[14]

Feimin Zhong, Jinxing Xie, Jing Jiao. Solutions for bargaining games with incomplete information: General type space and action space. Journal of Industrial & Management Optimization, 2018, 14 (3) : 953-966. doi: 10.3934/jimo.2017084

[15]

Beatris Adriana Escobedo-Trujillo, José Daniel López-Barrientos. Nonzero-sum stochastic differential games with additive structure and average payoffs. Journal of Dynamics & Games, 2014, 1 (4) : 555-578. doi: 10.3934/jdg.2014.1.555

[16]

Josef Hofbauer, Sylvain Sorin. Best response dynamics for continuous zero--sum games. Discrete & Continuous Dynamical Systems - B, 2006, 6 (1) : 215-224. doi: 10.3934/dcdsb.2006.6.215

[17]

Valery Y. Glizer, Oleg Kelis. Singular infinite horizon zero-sum linear-quadratic differential game: Saddle-point equilibrium sequence. Numerical Algebra, Control & Optimization, 2017, 7 (1) : 1-20. doi: 10.3934/naco.2017001

[18]

Lin Xu, Rongming Wang, Dingjun Yao. Optimal stochastic investment games under Markov regime switching market. Journal of Industrial & Management Optimization, 2014, 10 (3) : 795-815. doi: 10.3934/jimo.2014.10.795

[19]

Matthew Bourque, T. E. S. Raghavan. Policy improvement for perfect information additive reward and additive transition stochastic games with discounted and average payoffs. Journal of Dynamics & Games, 2014, 1 (3) : 347-361. doi: 10.3934/jdg.2014.1.347

[20]

Dejian Chang, Zhen Wu. Stochastic maximum principle for non-zero sum differential games of FBSDEs with impulse controls and its application to finance. Journal of Industrial & Management Optimization, 2015, 11 (1) : 27-40. doi: 10.3934/jimo.2015.11.27

 Impact Factor: 

Metrics

  • PDF downloads (3)
  • HTML views (0)
  • Cited by (0)

[Back to Top]