# American Institute of Mathematical Sciences

April  2017, 4(2): 125-148. doi: 10.3934/jdg.2017008

## A continuous-time approach to online optimization

 1 Centre de mathématiques appliquées, École polytechnique, Université Paris-Saclay, Palaiseau, France 2 CNRS (French National Center for Scientific Research), Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, F-38000, Grenoble, France

Received  January 2017 Revised  February 2017 Published  March 2017

We consider a family of mirror descent strategies for online optimization in continuous-time and we show that they lead to no regret. From a more traditional, discrete-time viewpoint, this continuous-time approach allows us to derive the no-regret properties of a large class of discrete-time algorithms including as special cases the exponential weights algorithm, online mirror descent, smooth fictitious play and vanishingly smooth fictitious play. In so doing, we obtain a unified view of many classical regret bounds, and we show that they can be decomposed into a term stemming from continuous-time considerations and a term which measures the disparity between discrete and continuous time. This generalizes the continuous-time based analysis of the exponential weights algorithm from [29]. As a result, we obtain a general class of infinite horizon learning strategies that guarantee an $\mathcal{O}(n^{-1/2})$ regret bound without having to resort to a doubling trick.

Citation: Joon Kwon, Panayotis Mertikopoulos. A continuous-time approach to online optimization. Journal of Dynamics & Games, 2017, 4 (2) : 125-148. doi: 10.3934/jdg.2017008
##### References:
 [1] P. Auer, N. Cesa-Bianchi and C. Gentile, Adaptive and self-confident on-line learning algorithms, Journal of Computer and System Sciences, 64 (2002), 48-75. doi: 10.1006/jcss.2001.1795. Google Scholar [2] A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, 31 (2003), 167-175. doi: 10.1016/S0167-6377(02)00231-6. Google Scholar [3] M. Benaïm and M. Faure, Consistency of vanishingly smooth fictitious play, Mathematics of Operations Research, 38 (2013), 437-450. doi: 10.1287/moor.1120.0568. Google Scholar [4] M. Benaïm, J. Hofbauer and S. Sorin, Stochastic approximations and differential inclusions, part Ⅱ: Applications, Mathematics of Operations Research, 31 (2006), 673-695. doi: 10.1287/moor.1060.0213. Google Scholar [5] L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, 7 (1967), 200-217. Google Scholar [6] S. Bubeck and N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and trends in machine learning, 5 (2012), 1-122. doi: 10.1561/2200000024. Google Scholar [7] S. Bubeck, Introduction to online optimization, Lecture Notes, 2011.Google Scholar [8] N. Cesa-Bianchi, Analysis of two gradient-based algorithms for on-line regression, in COLT '97: Proceedings of the 10th Annual Conference on Computational Learning Theory, 1997, 163–170. doi: 10.1145/267460.267492. Google Scholar [9] N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire and M. K. Warmuth, How to use expert advice, Journal of the ACM, 44 (1997), 427-485. doi: 10.1145/258128.258179. Google Scholar [10] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, 2006. doi: 10.1017/CBO9780511546921. Google Scholar [11] D. Fudenberg and D. K. Levine, Consistency and cautious fictitious play, Journal of Economic Dynamics and Control, 19 (1995), 1065-1089. doi: 10.1016/0165-1889(94)00819-4. Google Scholar [12] D. Fudenberg and D. K. Levine, The Theory of Learning in Games, vol. 2 of Economic learning and social evolution, MIT Press, Cambridge, MA, 1998. Google Scholar [13] D. Fudenberg and D. K. Levine, Conditional universal consistency, Games and Economic Behavior, 29 (1999), 104-130. doi: 10.1006/game.1998.0705. Google Scholar [14] J. Hannan, Approximation to Bayes risk in repeated play, in Contributions to the Theory of Games, Volume Ⅲ (eds. M. Dresher, A. W. Tucker and P. Wolfe), vol. 39 of Annals of Mathematics Studies, Princeton University Press, Princeton, NJ, 1957, 97–139.Google Scholar [15] E. Hazan, A survey: The convex optimization approach to regret minimization, in Optimization for Machine Learning (eds. S. N. Suvrit Spa and S. J. Wright), MIT Press, 2012, 287–304.Google Scholar [16] J. Hofbauer and W. H. Sandholm, On the global convergence of stochastic fictitious play, Econometrica, 70 (2002), 2265-2294. doi: 10.1111/1468-0262.00376. Google Scholar [17] S. M. Kakade, S. Shalev-Shwartz and A. Tewari, Regularization techniques for learning with matrices, The Journal of Machine Learning Research, 13 (2012), 1865-1890. Google Scholar [18] J. Kivinen and M. K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation, 132 (1997), 1-63. doi: 10.1006/inco.1996.2612. Google Scholar [19] N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Information and Computation, 108 (1994), 212-261. doi: 10.1006/inco.1994.1009. Google Scholar [20] S. Mannor and V. Perchet, Approchability, fast and slow, Journal of Machine Learning Research: Workshop and Conference Proceedings, 30 (2013), 1-16. Google Scholar [21] A. Mas-Colell, M. D. Whinston and J. R. Green, Microeconomic Theory, Oxford University Press, New York, NY, USA, 1995.Google Scholar [22] A. S. Nemirovski, A. Juditsky, G. G. Lan and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19 (2009), 1574-1609. doi: 10.1137/070704277. Google Scholar [23] A. S. Nemirovski and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, New York, NY, 1983. Google Scholar [24] Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical Programming, 120 (2009), 221-259. doi: 10.1007/s10107-007-0149-x. Google Scholar [25] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. Google Scholar [26] R. T. Rockafellar and R. J. B. Wets, Variational Analysis, vol. 317 of A Series of Comprehensive Studies in Mathematics, Springer-Verlag, Berlin, 1998. doi: 10.1007/978-3-642-02431-3. Google Scholar [27] S. Shalev-Shwartz, Online Learning: Theory, Algorithms, and Applications, PhD thesis, Hebrew University of Jerusalem, 2007. doi: 10.1017/CBO9781107298019.022. Google Scholar [28] S. Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning, 4 (2011), 107-194. doi: 10.1561/2200000018. Google Scholar [29] S. Sorin, Exponential weight algorithm in continuous time, Mathematical Programming, 116 (2009), 513-528. doi: 10.1007/s10107-007-0111-y. Google Scholar [30] V. G. Vovk, Aggregating strategies, in COLT '90: Proceedings of the Third Workshop on Computational Learning Theory, 1990,371–383. doi: 10.1016/B978-1-55860-146-8.50032-1. Google Scholar [31] V. G. Vovk, A game of prediction with expert advice, in COLT '95: Proceedings of the 8th Annual Conference on Computational Learning Theory, 1995, 51–60. doi: 10.1145/225298.225304. Google Scholar [32] M. K. Warmuth and A. K. Jagota, Continuous and discrete-time nonlinear gradient descent: Relative loss bounds and convergence, in Electronic Proceedings of the 5th International Symposium on Artificial Intelligence and Mathematics, 1997.Google Scholar [33] M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in ICML '03: Proceedings of the 20th International Conference on Machine Learning, 2003.Google Scholar

show all references

##### References:
 [1] P. Auer, N. Cesa-Bianchi and C. Gentile, Adaptive and self-confident on-line learning algorithms, Journal of Computer and System Sciences, 64 (2002), 48-75. doi: 10.1006/jcss.2001.1795. Google Scholar [2] A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, 31 (2003), 167-175. doi: 10.1016/S0167-6377(02)00231-6. Google Scholar [3] M. Benaïm and M. Faure, Consistency of vanishingly smooth fictitious play, Mathematics of Operations Research, 38 (2013), 437-450. doi: 10.1287/moor.1120.0568. Google Scholar [4] M. Benaïm, J. Hofbauer and S. Sorin, Stochastic approximations and differential inclusions, part Ⅱ: Applications, Mathematics of Operations Research, 31 (2006), 673-695. doi: 10.1287/moor.1060.0213. Google Scholar [5] L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, 7 (1967), 200-217. Google Scholar [6] S. Bubeck and N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and trends in machine learning, 5 (2012), 1-122. doi: 10.1561/2200000024. Google Scholar [7] S. Bubeck, Introduction to online optimization, Lecture Notes, 2011.Google Scholar [8] N. Cesa-Bianchi, Analysis of two gradient-based algorithms for on-line regression, in COLT '97: Proceedings of the 10th Annual Conference on Computational Learning Theory, 1997, 163–170. doi: 10.1145/267460.267492. Google Scholar [9] N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire and M. K. Warmuth, How to use expert advice, Journal of the ACM, 44 (1997), 427-485. doi: 10.1145/258128.258179. Google Scholar [10] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, 2006. doi: 10.1017/CBO9780511546921. Google Scholar [11] D. Fudenberg and D. K. Levine, Consistency and cautious fictitious play, Journal of Economic Dynamics and Control, 19 (1995), 1065-1089. doi: 10.1016/0165-1889(94)00819-4. Google Scholar [12] D. Fudenberg and D. K. Levine, The Theory of Learning in Games, vol. 2 of Economic learning and social evolution, MIT Press, Cambridge, MA, 1998. Google Scholar [13] D. Fudenberg and D. K. Levine, Conditional universal consistency, Games and Economic Behavior, 29 (1999), 104-130. doi: 10.1006/game.1998.0705. Google Scholar [14] J. Hannan, Approximation to Bayes risk in repeated play, in Contributions to the Theory of Games, Volume Ⅲ (eds. M. Dresher, A. W. Tucker and P. Wolfe), vol. 39 of Annals of Mathematics Studies, Princeton University Press, Princeton, NJ, 1957, 97–139.Google Scholar [15] E. Hazan, A survey: The convex optimization approach to regret minimization, in Optimization for Machine Learning (eds. S. N. Suvrit Spa and S. J. Wright), MIT Press, 2012, 287–304.Google Scholar [16] J. Hofbauer and W. H. Sandholm, On the global convergence of stochastic fictitious play, Econometrica, 70 (2002), 2265-2294. doi: 10.1111/1468-0262.00376. Google Scholar [17] S. M. Kakade, S. Shalev-Shwartz and A. Tewari, Regularization techniques for learning with matrices, The Journal of Machine Learning Research, 13 (2012), 1865-1890. Google Scholar [18] J. Kivinen and M. K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors, Information and Computation, 132 (1997), 1-63. doi: 10.1006/inco.1996.2612. Google Scholar [19] N. Littlestone and M. K. Warmuth, The weighted majority algorithm, Information and Computation, 108 (1994), 212-261. doi: 10.1006/inco.1994.1009. Google Scholar [20] S. Mannor and V. Perchet, Approchability, fast and slow, Journal of Machine Learning Research: Workshop and Conference Proceedings, 30 (2013), 1-16. Google Scholar [21] A. Mas-Colell, M. D. Whinston and J. R. Green, Microeconomic Theory, Oxford University Press, New York, NY, USA, 1995.Google Scholar [22] A. S. Nemirovski, A. Juditsky, G. G. Lan and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19 (2009), 1574-1609. doi: 10.1137/070704277. Google Scholar [23] A. S. Nemirovski and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, New York, NY, 1983. Google Scholar [24] Y. Nesterov, Primal-dual subgradient methods for convex problems, Mathematical Programming, 120 (2009), 221-259. doi: 10.1007/s10107-007-0149-x. Google Scholar [25] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. Google Scholar [26] R. T. Rockafellar and R. J. B. Wets, Variational Analysis, vol. 317 of A Series of Comprehensive Studies in Mathematics, Springer-Verlag, Berlin, 1998. doi: 10.1007/978-3-642-02431-3. Google Scholar [27] S. Shalev-Shwartz, Online Learning: Theory, Algorithms, and Applications, PhD thesis, Hebrew University of Jerusalem, 2007. doi: 10.1017/CBO9781107298019.022. Google Scholar [28] S. Shalev-Shwartz, Online learning and online convex optimization, Foundations and Trends in Machine Learning, 4 (2011), 107-194. doi: 10.1561/2200000018. Google Scholar [29] S. Sorin, Exponential weight algorithm in continuous time, Mathematical Programming, 116 (2009), 513-528. doi: 10.1007/s10107-007-0111-y. Google Scholar [30] V. G. Vovk, Aggregating strategies, in COLT '90: Proceedings of the Third Workshop on Computational Learning Theory, 1990,371–383. doi: 10.1016/B978-1-55860-146-8.50032-1. Google Scholar [31] V. G. Vovk, A game of prediction with expert advice, in COLT '95: Proceedings of the 8th Annual Conference on Computational Learning Theory, 1995, 51–60. doi: 10.1145/225298.225304. Google Scholar [32] M. K. Warmuth and A. K. Jagota, Continuous and discrete-time nonlinear gradient descent: Relative loss bounds and convergence, in Electronic Proceedings of the 5th International Symposium on Artificial Intelligence and Mathematics, 1997.Google Scholar [33] M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in ICML '03: Proceedings of the 20th International Conference on Machine Learning, 2003.Google Scholar
Graphical illustration of the greedy (dashed) and lazy (solid) branches of the projected subgradient (PSG) method
Greedy and Lazy Mirror Descent with γn = 1
Summary of the algorithms discussed in Section 6.The suffix "L" indicates a "lazy" variant; the INPUT column stands for the stream of payoff vectors which is used as input for the algorithm and the NORM column specifies the norm of the ambient space; finally, $\xi_{n}$ represents a zero-mean stochastic process with values in $\mathbb{R}^{d}$
 ALGORITHM $\mathcal{C}$ $h(x)$ $\eta _{n}$ INPUT NORM EW $\Delta_d$ $\sum_{i} x_{i}\log x_{i}$ CONSTANT $v_{n}$ $\ell^{1}$ EW' $\Delta_{d}$ $\sum_{i} x_{i}\log x_{i}$ $\eta /\sqrt{n}$ $v_{n}$ $\ell^{1}$ SFP $\Delta_d$ ANY $\eta /n$ $v_{n}$ $\ell^{1}$ VSFP $\Delta_d$ ANY $\eta n^{-\alpha} \;\;(0<\alpha<1)$ $v_{n}$ $\ell^{1}$ OGD-L ANY $\frac{1}{2} \left\| x \right\|_2^2$ CONSTANT $-\nabla f_{n}(x_{n})$ $\ell^{2}$ OMD-L ANY ANY CONSTANT $-\nabla f_{n}(x_{n})$ ANY PSG-L ANY $\frac{1}{2} \left\| x \right\|_2^2$ $1$ $-\gamma_{n}\nabla f(x_{n})$ $\ell^{2}$ MD-L ANY ANY $1$ $-\gamma_{n} \nabla f(x_{n})$ ANY RSA-L ANY ANY $1$ $-\gamma_{n} (\nabla f(x_{n})+\xi_n)$ ANY SPSG-L ANY $\frac{1}{2} \left\| x \right\|_2^2$ $1$ $-\gamma_{n}(\nabla f(x_{n})+\xi_n)$ $\ell^{2}$
 ALGORITHM $\mathcal{C}$ $h(x)$ $\eta _{n}$ INPUT NORM EW $\Delta_d$ $\sum_{i} x_{i}\log x_{i}$ CONSTANT $v_{n}$ $\ell^{1}$ EW' $\Delta_{d}$ $\sum_{i} x_{i}\log x_{i}$ $\eta /\sqrt{n}$ $v_{n}$ $\ell^{1}$ SFP $\Delta_d$ ANY $\eta /n$ $v_{n}$ $\ell^{1}$ VSFP $\Delta_d$ ANY $\eta n^{-\alpha} \;\;(0<\alpha<1)$ $v_{n}$ $\ell^{1}$ OGD-L ANY $\frac{1}{2} \left\| x \right\|_2^2$ CONSTANT $-\nabla f_{n}(x_{n})$ $\ell^{2}$ OMD-L ANY ANY CONSTANT $-\nabla f_{n}(x_{n})$ ANY PSG-L ANY $\frac{1}{2} \left\| x \right\|_2^2$ $1$ $-\gamma_{n}\nabla f(x_{n})$ $\ell^{2}$ MD-L ANY ANY $1$ $-\gamma_{n} \nabla f(x_{n})$ ANY RSA-L ANY ANY $1$ $-\gamma_{n} (\nabla f(x_{n})+\xi_n)$ ANY SPSG-L ANY $\frac{1}{2} \left\| x \right\|_2^2$ $1$ $-\gamma_{n}(\nabla f(x_{n})+\xi_n)$ $\ell^{2}$
 [1] Wataru Nakamura, Yasushi Narushima, Hiroshi Yabe. Nonlinear conjugate gradient methods with sufficient descent properties for unconstrained optimization. Journal of Industrial & Management Optimization, 2013, 9 (3) : 595-619. doi: 10.3934/jimo.2013.9.595 [2] Yingying Li, Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems & Imaging, 2009, 3 (3) : 487-503. doi: 10.3934/ipi.2009.3.487 [3] Xiaming Chen. Kernel-based online gradient descent using distributed approach. Mathematical Foundations of Computing, 2019, 2 (1) : 1-9. doi: 10.3934/mfc.2019001 [4] King Hann Lim, Hong Hui Tan, Hendra G. Harno. Approximate greatest descent in neural network optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 327-336. doi: 10.3934/naco.2018021 [5] Zhongliang Deng, Enwen Hu. Error minimization with global optimization for difference of convex functions. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1027-1033. doi: 10.3934/dcdss.2019070 [6] M. S. Lee, B. S. Goh, H. G. Harno, K. H. Lim. On a two-phase approximate greatest descent method for nonlinear optimization with equality constraints. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 315-326. doi: 10.3934/naco.2018020 [7] Shishun Li, Zhengda Huang. Guaranteed descent conjugate gradient methods with modified secant condition. Journal of Industrial & Management Optimization, 2008, 4 (4) : 739-755. doi: 10.3934/jimo.2008.4.739 [8] Ying Ji, Shaojian Qu, Yeming Dai. A new approach for worst-case regret portfolio optimization problem. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 761-770. doi: 10.3934/dcdss.2019050 [9] Murat Adivar, Shu-Cherng Fang. Convex optimization on mixed domains. Journal of Industrial & Management Optimization, 2012, 8 (1) : 189-227. doi: 10.3934/jimo.2012.8.189 [10] Saman Babaie–Kafaki, Reza Ghanbari. A class of descent four–term extension of the Dai–Liao conjugate gradient method based on the scaled memoryless BFGS update. Journal of Industrial & Management Optimization, 2017, 13 (2) : 649-658. doi: 10.3934/jimo.2016038 [11] Gaohang Yu, Lutai Guan, Guoyin Li. Global convergence of modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property. Journal of Industrial & Management Optimization, 2008, 4 (3) : 565-579. doi: 10.3934/jimo.2008.4.565 [12] Yigui Ou, Xin Zhou. A modified scaled memoryless BFGS preconditioned conjugate gradient algorithm for nonsmooth convex optimization. Journal of Industrial & Management Optimization, 2018, 14 (2) : 785-801. doi: 10.3934/jimo.2017075 [13] Ian D. Morris. Ergodic optimization for generic continuous functions. Discrete & Continuous Dynamical Systems - A, 2010, 27 (1) : 383-388. doi: 10.3934/dcds.2010.27.383 [14] Zhenhuan Yang, Yiming Ying, Qilong Min. Online optimization for residential PV-ESS energy system scheduling. Mathematical Foundations of Computing, 2019, 2 (1) : 55-71. doi: 10.3934/mfc.2019005 [15] Herbert Gajewski, Jens A. Griepentrog. A descent method for the free energy of multicomponent systems. Discrete & Continuous Dynamical Systems - A, 2006, 15 (2) : 505-528. doi: 10.3934/dcds.2006.15.505 [16] Joseph Hundley, Eitan Sayag. Descent construction for GSpin groups: Main results and applications. Electronic Research Announcements, 2009, 16: 30-36. doi: 10.3934/era.2009.16.30 [17] Anulekha Dhara, Aparna Mehra. Conjugate duality for generalized convex optimization problems. Journal of Industrial & Management Optimization, 2007, 3 (3) : 415-427. doi: 10.3934/jimo.2007.3.415 [18] Igor Griva, Roman A. Polyak. Proximal point nonlinear rescaling method for convex optimization. Numerical Algebra, Control & Optimization, 2011, 1 (2) : 283-299. doi: 10.3934/naco.2011.1.283 [19] Nobuko Sagara, Masao Fukushima. trust region method for nonsmooth convex optimization. Journal of Industrial & Management Optimization, 2005, 1 (2) : 171-180. doi: 10.3934/jimo.2005.1.171 [20] Adil Bagirov, Sona Taheri, Soodabeh Asadi. A difference of convex optimization algorithm for piecewise linear regression. Journal of Industrial & Management Optimization, 2019, 15 (2) : 909-932. doi: 10.3934/jimo.2018077

Impact Factor:

## Metrics

• PDF downloads (14)
• HTML views (2)
• Cited by (1)

## Other articlesby authors

• on AIMS
• on Google Scholar

[Back to Top]