September  2019, 1(3): 249-269. doi: 10.3934/fods.2019011

General risk measures for robust machine learning

a. 

CentraleSupélec, Inria Saclay, Université Paris-Saclay, Center for Visual Computing, Gif-sur-Yvette, 91190, France

b. 

Université Paris-Est, CERMICS (ENPC), Labex Bézout, 6-8 avenue Blaise Pascal, Champs-sur-Marne, 77420, France

* Corresponding author: Henri Gérard

Published  August 2019

Fund Project: The work of second author was supported by ENPC and Labex Bézout. The work of third author was supported by Institut Universitaire de France

A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on $ \varphi $-divergences and the Wasserstein metric. We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.

Citation: Émilie Chouzenoux, Henri Gérard, Jean-Christophe Pesquet. General risk measures for robust machine learning. Foundations of Data Science, 2019, 1 (3) : 249-269. doi: 10.3934/fods.2019011
References:
[1]

S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B (Methodological), 28 (1966), 131-142. doi: 10.1111/j.2517-6161.1966.tb00626.x. Google Scholar

[2]

P. ArtznerF. DelbaenJ.-M. Eber and D. Heath, Coherent measures of risk, Mathematical Finance, 9 (1999), 203-228. doi: 10.1111/1467-9965.00068. Google Scholar

[3]

M. Basseville, Divergence measures for statistical data processing–an annotated bibliography, Signal Processing, 93 (2013), 621-633. doi: 10.1016/j.sigpro.2012.09.003. Google Scholar

[4]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2011. doi: 10.1007/978-3-319-48311-5. Google Scholar

[5]

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), 183-202. doi: 10.1137/080716542. Google Scholar

[6]

A. Ben-Tal and A. Nemirovski, Robust solutions of linear programming problems contaminated with uncertain data, Mathematical Programming, 88 (2000), 411-424. doi: 10.1007/PL00011380. Google Scholar

[7] A. Ben-TalL. El Ghaoui and A. Nemirovski, Robust Optimization, Princeton University Press, 2009. Google Scholar
[8]

A. Ben-TalD. Den HertogA. De WaegenaereB. Melenberg and G. Rennen, Robust solutions of optimization problems affected by uncertain probabilities, Management Science, 59 (2013), 341-357. doi: 10.1287/mnsc.1120.1641. Google Scholar

[9]

A. P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms, Pattern Recognition, 30 (1997), 1145-1159. doi: 10.1016/S0031-3203(96)00142-2. Google Scholar

[10]

L. M. Briceno-AriasG. ChierchiaE. Chouzenoux and J.-C. Pesquet, A random block-coordinate douglas-rachford splitting method with low computational complexity for binary logistic regression, Computational Optimization and Applications, 72 (2019), 707-726. doi: 10.1007/s10589-019-00060-6. Google Scholar

[11]

A. Chambolle and C. Dossal, On the convergence of the iterates of "FISTA", Journal of Optimization Theory and Applications, 166 (2015), 968-982. doi: 10.1007/s10957-015-0746-4. Google Scholar

[12]

P. L. Combettes, Strong convergence of block-iterative outer approximation methods for convex optimization, SIAM Journal on Control and Optimization, 38 (2000), 538-565. doi: 10.1137/S036301299732626X. Google Scholar

[13]

P. L. Combettes, A block-iterative surrogate constraint splitting method for quadratic signal recovery, IEEE Transactions on Signal Processing, 51 (2003), 1771-1782. doi: 10.1109/TSP.2003.812846. Google Scholar

[14]

P. L. Combettes and C. L. Müller, Perspective functions: Proximal calculus and applications in high-dimensional statistics, Journal of Mathematical Analysis and Applications, 457 (2018), 1283-1306. doi: 10.1016/j.jmaa.2016.12.021. Google Scholar

[15]

P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 185–212, Springer Optim. Appl., 49, Springer, New York, 2011. doi: 10.1007/978-1-4419-9569-8_10. Google Scholar

[16]

P. L. CombettesD. Dung and B. C. Vũ, Dualization of signal recovery problems, Set-Valued and Variational Analysis, 18 (2010), 373-404. doi: 10.1007/s11228-010-0147-7. Google Scholar

[17]

I. Csiszár, Eine informationstheoretische ungleichung und ihre anwendung auf beweis der ergodizitaet von markoffschen ketten, Magyer Tud. Akad. Mat. Kutato Int. Koezl., 8 (1963), 85-108. Google Scholar

[18]

J. Duchi, P. Glynn and H. Namkoong, Statistics of robust optimization: A generalized empirical likelihood approach, preprint, arXiv: 1610.03425, 2016.Google Scholar

[19]

P. M. Esfahani and D. Kuhn, Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations, Mathematical Programming, 171 (2018), 115-166. doi: 10.1007/s10107-017-1172-1. Google Scholar

[20]

P. M. EsfahaniS. Shafieezadeh-AbadehG. A. Hanasusanto and D. Kuhn, Data-driven inverse optimization with imperfect information, Mathematical Programming, 167 (2018), 191-234. doi: 10.1007/s10107-017-1216-6. Google Scholar

[21]

J. Feng, H. Xu, S. Mannor and S. Yan, Robust logistic regression and classification, In Advances in Neural Information Processing Systems, 2014,253–261.Google Scholar

[22]

H. Föllmer and A. Schied, Stochastic Finance: An Introduction in Discrete Time (4th edition), Walter de Gruyter, 2016. Google Scholar

[23]

J.-y. GotohM. J. Kim and A. E. Lim, Robust empirical optimization is almost the same as mean–variance optimization, Operations Research Letters, 46 (2018), 448-452. doi: 10.1016/j.orl.2018.05.005. Google Scholar

[24]

Y. Haugazeau, Sur les inéquations variationnelles et la minimisation de fonctionnelles convexes, These, Universite de Paris, 1968.Google Scholar

[25]

Z. Hu and L. J. Hong, Kullback-leibler divergence constrained distributionally robust optimization, Available at Optimization Online, 2013.Google Scholar

[26]

A. Kurakin, I. Goodfellow and S. Bengio, Adversarial examples in the physical world, preprint, arXiv: 1607.02533, 2016.Google Scholar

[27]

S. Moghaddam and M. Mahlooji, Robust simulation optimization using $\varphi$-divergence, International Journal of Industrial Engineering Computations, 7 (2016), 517-534. doi: 10.5267/j.ijiec.2016.5.003. Google Scholar

[28]

T. Morimoto, Markov processes and the h-theorem, Journal of the Physical Society of Japan, 18 (1963), 328-331. doi: 10.1143/JPSJ.18.328. Google Scholar

[29]

H. Namkoong and J. C. Duchi, Stochastic gradient methods for distributionally robust optimization with f-divergences, In Advances in Neural Information Processing Systems, 2016, 2208–2216.Google Scholar

[30]

N. Papernot, P. McDaniel and I. Goodfellow, Transferability in machine learning: From phenomena to black-box attacks using adversarial samples, preprint, arXiv: 1605.07277, 2016.Google Scholar

[31]

Y. Plan and R. Vershynin, Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach, IEEE Transactions on Information Theory, 59 (2013), 482-494. doi: 10.1109/TIT.2012.2207945. Google Scholar

[32]

R. T. Rockafellar and S. Uryasev, Optimization of conditional value-at-risk, Journal of Risk, 2 (2000), 21-42. doi: 10.21314/JOR.2000.038. Google Scholar

[33]

A. Ruszczyński and A. Shapiro, Conditional risk mappings, Mathematics of Operations Research, 31 (2006), 544-561. doi: 10.1287/moor.1060.0204. Google Scholar

[34]

A. Ruszczynski and A. Shapiro, Optimization of convex risk functions, Mathematics of Operations Research, 31 (2006), 433-452. doi: 10.1287/moor.1050.0186. Google Scholar

[35]

S. Shafieezadeh-Abadeh, P. M. Esfahani and D. Kuhn, Distributionally robust logistic regression, In Advances in Neural Information Processing Systems, (2015), 1576–1584.Google Scholar

show all references

References:
[1]

S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B (Methodological), 28 (1966), 131-142. doi: 10.1111/j.2517-6161.1966.tb00626.x. Google Scholar

[2]

P. ArtznerF. DelbaenJ.-M. Eber and D. Heath, Coherent measures of risk, Mathematical Finance, 9 (1999), 203-228. doi: 10.1111/1467-9965.00068. Google Scholar

[3]

M. Basseville, Divergence measures for statistical data processing–an annotated bibliography, Signal Processing, 93 (2013), 621-633. doi: 10.1016/j.sigpro.2012.09.003. Google Scholar

[4]

H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New York, 2011. doi: 10.1007/978-3-319-48311-5. Google Scholar

[5]

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2 (2009), 183-202. doi: 10.1137/080716542. Google Scholar

[6]

A. Ben-Tal and A. Nemirovski, Robust solutions of linear programming problems contaminated with uncertain data, Mathematical Programming, 88 (2000), 411-424. doi: 10.1007/PL00011380. Google Scholar

[7] A. Ben-TalL. El Ghaoui and A. Nemirovski, Robust Optimization, Princeton University Press, 2009. Google Scholar
[8]

A. Ben-TalD. Den HertogA. De WaegenaereB. Melenberg and G. Rennen, Robust solutions of optimization problems affected by uncertain probabilities, Management Science, 59 (2013), 341-357. doi: 10.1287/mnsc.1120.1641. Google Scholar

[9]

A. P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms, Pattern Recognition, 30 (1997), 1145-1159. doi: 10.1016/S0031-3203(96)00142-2. Google Scholar

[10]

L. M. Briceno-AriasG. ChierchiaE. Chouzenoux and J.-C. Pesquet, A random block-coordinate douglas-rachford splitting method with low computational complexity for binary logistic regression, Computational Optimization and Applications, 72 (2019), 707-726. doi: 10.1007/s10589-019-00060-6. Google Scholar

[11]

A. Chambolle and C. Dossal, On the convergence of the iterates of "FISTA", Journal of Optimization Theory and Applications, 166 (2015), 968-982. doi: 10.1007/s10957-015-0746-4. Google Scholar

[12]

P. L. Combettes, Strong convergence of block-iterative outer approximation methods for convex optimization, SIAM Journal on Control and Optimization, 38 (2000), 538-565. doi: 10.1137/S036301299732626X. Google Scholar

[13]

P. L. Combettes, A block-iterative surrogate constraint splitting method for quadratic signal recovery, IEEE Transactions on Signal Processing, 51 (2003), 1771-1782. doi: 10.1109/TSP.2003.812846. Google Scholar

[14]

P. L. Combettes and C. L. Müller, Perspective functions: Proximal calculus and applications in high-dimensional statistics, Journal of Mathematical Analysis and Applications, 457 (2018), 1283-1306. doi: 10.1016/j.jmaa.2016.12.021. Google Scholar

[15]

P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 185–212, Springer Optim. Appl., 49, Springer, New York, 2011. doi: 10.1007/978-1-4419-9569-8_10. Google Scholar

[16]

P. L. CombettesD. Dung and B. C. Vũ, Dualization of signal recovery problems, Set-Valued and Variational Analysis, 18 (2010), 373-404. doi: 10.1007/s11228-010-0147-7. Google Scholar

[17]

I. Csiszár, Eine informationstheoretische ungleichung und ihre anwendung auf beweis der ergodizitaet von markoffschen ketten, Magyer Tud. Akad. Mat. Kutato Int. Koezl., 8 (1963), 85-108. Google Scholar

[18]

J. Duchi, P. Glynn and H. Namkoong, Statistics of robust optimization: A generalized empirical likelihood approach, preprint, arXiv: 1610.03425, 2016.Google Scholar

[19]

P. M. Esfahani and D. Kuhn, Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations, Mathematical Programming, 171 (2018), 115-166. doi: 10.1007/s10107-017-1172-1. Google Scholar

[20]

P. M. EsfahaniS. Shafieezadeh-AbadehG. A. Hanasusanto and D. Kuhn, Data-driven inverse optimization with imperfect information, Mathematical Programming, 167 (2018), 191-234. doi: 10.1007/s10107-017-1216-6. Google Scholar

[21]

J. Feng, H. Xu, S. Mannor and S. Yan, Robust logistic regression and classification, In Advances in Neural Information Processing Systems, 2014,253–261.Google Scholar

[22]

H. Föllmer and A. Schied, Stochastic Finance: An Introduction in Discrete Time (4th edition), Walter de Gruyter, 2016. Google Scholar

[23]

J.-y. GotohM. J. Kim and A. E. Lim, Robust empirical optimization is almost the same as mean–variance optimization, Operations Research Letters, 46 (2018), 448-452. doi: 10.1016/j.orl.2018.05.005. Google Scholar

[24]

Y. Haugazeau, Sur les inéquations variationnelles et la minimisation de fonctionnelles convexes, These, Universite de Paris, 1968.Google Scholar

[25]

Z. Hu and L. J. Hong, Kullback-leibler divergence constrained distributionally robust optimization, Available at Optimization Online, 2013.Google Scholar

[26]

A. Kurakin, I. Goodfellow and S. Bengio, Adversarial examples in the physical world, preprint, arXiv: 1607.02533, 2016.Google Scholar

[27]

S. Moghaddam and M. Mahlooji, Robust simulation optimization using $\varphi$-divergence, International Journal of Industrial Engineering Computations, 7 (2016), 517-534. doi: 10.5267/j.ijiec.2016.5.003. Google Scholar

[28]

T. Morimoto, Markov processes and the h-theorem, Journal of the Physical Society of Japan, 18 (1963), 328-331. doi: 10.1143/JPSJ.18.328. Google Scholar

[29]

H. Namkoong and J. C. Duchi, Stochastic gradient methods for distributionally robust optimization with f-divergences, In Advances in Neural Information Processing Systems, 2016, 2208–2216.Google Scholar

[30]

N. Papernot, P. McDaniel and I. Goodfellow, Transferability in machine learning: From phenomena to black-box attacks using adversarial samples, preprint, arXiv: 1605.07277, 2016.Google Scholar

[31]

Y. Plan and R. Vershynin, Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach, IEEE Transactions on Information Theory, 59 (2013), 482-494. doi: 10.1109/TIT.2012.2207945. Google Scholar

[32]

R. T. Rockafellar and S. Uryasev, Optimization of conditional value-at-risk, Journal of Risk, 2 (2000), 21-42. doi: 10.21314/JOR.2000.038. Google Scholar

[33]

A. Ruszczyński and A. Shapiro, Conditional risk mappings, Mathematics of Operations Research, 31 (2006), 544-561. doi: 10.1287/moor.1060.0204. Google Scholar

[34]

A. Ruszczynski and A. Shapiro, Optimization of convex risk functions, Mathematics of Operations Research, 31 (2006), 433-452. doi: 10.1287/moor.1050.0186. Google Scholar

[35]

S. Shafieezadeh-Abadeh, P. M. Esfahani and D. Kuhn, Distributionally robust logistic regression, In Advances in Neural Information Processing Systems, (2015), 1576–1584.Google Scholar

Figure 1.  $\mathtt{ionosphere} $ dataset: Log of the difference between current loss and final loss, with respect to the iteration number for various values of $ \epsilon $
Figure 2.  $\mathtt{ionosphere} $ dataset: Log of the difference between current loss and final loss, with respect to the CPU time for vaious values of $ \epsilon $ over the first 100 iterations
Figure 3.  $\mathtt{ionosphere} $ dataset: AUC metric as a function of $ \epsilon $
Figure 4.  $\mathtt{ionosphere} $ dataset (altered): ROC curve for different values of $ \epsilon $
Figure 5.  $\mathtt{ionosphere} $ dataset: AUC histogram for 1000 random realizations using 10% of data for the training set. Robust model is used with $ \epsilon = 0.001 $
Figure 6.  $\mathtt{ionosphere} $ dataset: AUC histogram for 1000 random realizations using 60% of data for the training set. Robust model is used with $ \epsilon = 0.001 $
Table 1.  Common perspective functions and their conjugate used to define $\varphi$ -divergences
Divergence $\varphi\left( t \right)$ $\varphi\left( t \right), t \geq 0$ ${D_\varphi }\left( {p,q} \right)$ $\varphi^{*}\ \left( s \right)$ $\tilde \varphi \left( t \right)$
Kullback-Leibler $\varphi_{kl}\left( t \right)$ $t\log\left( t \right) -t +1$ $\sum_{i = 1}^{N}p_{i}\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ $e^{s}-1$ $\varphi_{b}\left( t \right)$
Burg entropy $\varphi_{b}\left( t \right)$ $-\log\left( t \right)+t-1$ $\sum_{i = 1}^{N}q_{i}\log\left( {\frac{{{q_i}}}{{{p_i}}}} \right)$ $-\log[\left( {1 - s} \right), s < 1$ $\varphi_{kl}\left( t \right)$
J-divergence $\varphi_{j}\left( t \right)$ $\left( {t - {\rm{1}}} \right)\log\left( t \right)$ $\sum_{i=1}^{N}\left( {{p_i} - {q_i}} \right)\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ no closed form $\varphi_{j}\left( t \right)$
$\chi^{2}$-distance $\varphi_{c}\left( t \right)$ $\frac{1}{t}\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{p_{i}-q_{i}}{p_{i}}$ $2-2\sqrt{1-s}, s <1$ $\varphi_{mc}\left( t \right)$
Modified $\chi^{2}$-distance $\varphi_{mc}\left( t \right)$ $\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{q_{i}-p_{i}}{q_{i}}$ $ \left \{ \begin{array}{ll} -1, &s <-2 \\ s+s^{2}/4, &s\geq-2 \end{array} \right . $ $\varphi_{c}\left( t \right)$
Hellinger distance $\varphi_{h}\left( t \right)$ $\left( {\sqrt t - 1} \right)^{2}$ $\sum_{i=1}^{N}\left( {\sqrt {{p_i}} - \sqrt {{q_i}} } \right)$ $\frac{s}{1-s},s <1$ $\varphi_{h}\left( t \right)$
$\chi$-divergence of order $\theta$>1 $\varphi_{ca}^{\theta}\left( t \right)$ $|{t-1}|^{\theta}$ $\sum_{i=1}^{N}q_{i}{\rm{|}}1 - \frac{{{p_i}}}{{{q_i}}}|^{\theta}$ $s+\left( {\theta - 1} \right){\left( {\frac{{|s|}}{\theta }} \right)^{\frac{\theta }{{\theta - 1}}}}$ $t^{1-\theta}\varphi_{ca}^{\theta}\left( t \right)$
Variation distance $\varphi_{v}\left( t \right)$ $|{t-1}|$ $\sum_{i=1}^{N}|{p_i} - {q_i}|$ $ \left \{ \begin{array}{ll} -1, &s\leq-1 \\ s, &-1 \leq s \leq 1 \end{array} \right . $ $\varphi_{v}\left( t \right)$
Cressie and Read $\varphi_{cr}^{\theta}\left( t \right)$ $\frac{1-\theta+\theta t-t^{\theta}}{\theta\left( {1 - \theta } \right)}, \theta \notin {\rm{\{ 0,1\} }}$ $\frac{1}{\theta\left( {1 - \theta } \right)}\left( {1 - \sum _{i = 1}^N {p_i^\theta } q_i^{1 - \theta }} \right)$ $ \left \{ \begin{array}{l} \frac{1}{\theta}\left( {1 - s\left( {1 - \theta } \right)} \right)^{\frac{\theta}{\theta-1}}-\frac{1}{\theta} \\ s < \frac{1}{\theta-1} \end{array} \right . $ $\varphi_{cr}^{1-\theta}\left( t \right)$
Average Value at Risk of level $\beta$ $\varphi_{\textrm{avar}}^{\beta}\left( t \right)$ $\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}, \beta \in [0,1]$ $\sum_{i=1}^{N}\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}(\frac{p_{i}}{q_{i}})$ $\sigma_{\left[ {0,\frac{1}{{1 - \beta }}} \right]} = \left \{ \begin{array}{l} \frac{1}{1-\beta} , s\geq 0 \\ 0 , s < 0 \end{array} \right . $ $\iota_{[1-\beta,+\infty[}$
Divergence $\varphi\left( t \right)$ $\varphi\left( t \right), t \geq 0$ ${D_\varphi }\left( {p,q} \right)$ $\varphi^{*}\ \left( s \right)$ $\tilde \varphi \left( t \right)$
Kullback-Leibler $\varphi_{kl}\left( t \right)$ $t\log\left( t \right) -t +1$ $\sum_{i = 1}^{N}p_{i}\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ $e^{s}-1$ $\varphi_{b}\left( t \right)$
Burg entropy $\varphi_{b}\left( t \right)$ $-\log\left( t \right)+t-1$ $\sum_{i = 1}^{N}q_{i}\log\left( {\frac{{{q_i}}}{{{p_i}}}} \right)$ $-\log[\left( {1 - s} \right), s < 1$ $\varphi_{kl}\left( t \right)$
J-divergence $\varphi_{j}\left( t \right)$ $\left( {t - {\rm{1}}} \right)\log\left( t \right)$ $\sum_{i=1}^{N}\left( {{p_i} - {q_i}} \right)\log\left( {\frac{{{p_i}}}{{{q_i}}}} \right)$ no closed form $\varphi_{j}\left( t \right)$
$\chi^{2}$-distance $\varphi_{c}\left( t \right)$ $\frac{1}{t}\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{p_{i}-q_{i}}{p_{i}}$ $2-2\sqrt{1-s}, s <1$ $\varphi_{mc}\left( t \right)$
Modified $\chi^{2}$-distance $\varphi_{mc}\left( t \right)$ $\left( {t - {\rm{1}}} \right)^{2}$ $\sum_{i=1}^{N}\frac{q_{i}-p_{i}}{q_{i}}$ $ \left \{ \begin{array}{ll} -1, &s <-2 \\ s+s^{2}/4, &s\geq-2 \end{array} \right . $ $\varphi_{c}\left( t \right)$
Hellinger distance $\varphi_{h}\left( t \right)$ $\left( {\sqrt t - 1} \right)^{2}$ $\sum_{i=1}^{N}\left( {\sqrt {{p_i}} - \sqrt {{q_i}} } \right)$ $\frac{s}{1-s},s <1$ $\varphi_{h}\left( t \right)$
$\chi$-divergence of order $\theta$>1 $\varphi_{ca}^{\theta}\left( t \right)$ $|{t-1}|^{\theta}$ $\sum_{i=1}^{N}q_{i}{\rm{|}}1 - \frac{{{p_i}}}{{{q_i}}}|^{\theta}$ $s+\left( {\theta - 1} \right){\left( {\frac{{|s|}}{\theta }} \right)^{\frac{\theta }{{\theta - 1}}}}$ $t^{1-\theta}\varphi_{ca}^{\theta}\left( t \right)$
Variation distance $\varphi_{v}\left( t \right)$ $|{t-1}|$ $\sum_{i=1}^{N}|{p_i} - {q_i}|$ $ \left \{ \begin{array}{ll} -1, &s\leq-1 \\ s, &-1 \leq s \leq 1 \end{array} \right . $ $\varphi_{v}\left( t \right)$
Cressie and Read $\varphi_{cr}^{\theta}\left( t \right)$ $\frac{1-\theta+\theta t-t^{\theta}}{\theta\left( {1 - \theta } \right)}, \theta \notin {\rm{\{ 0,1\} }}$ $\frac{1}{\theta\left( {1 - \theta } \right)}\left( {1 - \sum _{i = 1}^N {p_i^\theta } q_i^{1 - \theta }} \right)$ $ \left \{ \begin{array}{l} \frac{1}{\theta}\left( {1 - s\left( {1 - \theta } \right)} \right)^{\frac{\theta}{\theta-1}}-\frac{1}{\theta} \\ s < \frac{1}{\theta-1} \end{array} \right . $ $\varphi_{cr}^{1-\theta}\left( t \right)$
Average Value at Risk of level $\beta$ $\varphi_{\textrm{avar}}^{\beta}\left( t \right)$ $\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}, \beta \in [0,1]$ $\sum_{i=1}^{N}\iota_{\left[ {0,\frac{1}{{1 - \beta }}} \right]}(\frac{p_{i}}{q_{i}})$ $\sigma_{\left[ {0,\frac{1}{{1 - \beta }}} \right]} = \left \{ \begin{array}{l} \frac{1}{1-\beta} , s\geq 0 \\ 0 , s < 0 \end{array} \right . $ $\iota_{[1-\beta,+\infty[}$
Table 2.  Parameters of the datasets
Name of dataset $\mathtt{ionosphere} $ $\mathtt{colon-cancer}$
Number of observations ($ N $) 351 64
Number of features ($ d $) 34 2000
Name of dataset $\mathtt{ionosphere} $ $\mathtt{colon-cancer}$
Number of observations ($ N $) 351 64
Number of features ($ d $) 34 2000
Table 3.  $\mathtt{colon-cancer}$ dataset: Values of the AUC for different values of $ \epsilon $
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.832 0.832
$ \epsilon = 0.001 $ 0.757 0.787
$ \epsilon = 0.002 $ 0.750 0.770
$ \epsilon = 0.003 $ 0.779 0.706
$ \epsilon = 0.004 $ 0.698 0.691
$ \epsilon = 0.005 $ 0.868 0.831
$ \epsilon = 0.006 $ 0.890 0.860
$ \epsilon = 0.007 $ 0.728 0.838
$ \epsilon = 0.008 $ 0.809 0.768
$ \epsilon = 0.009 $ 0.875 0.890
$ \epsilon = 0.01 $ 0.801 0.853
$ \epsilon = 0.05 $ 0.786 0.794
$ \epsilon = 0.1 $ 0.801 0.816
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.832 0.832
$ \epsilon = 0.001 $ 0.757 0.787
$ \epsilon = 0.002 $ 0.750 0.770
$ \epsilon = 0.003 $ 0.779 0.706
$ \epsilon = 0.004 $ 0.698 0.691
$ \epsilon = 0.005 $ 0.868 0.831
$ \epsilon = 0.006 $ 0.890 0.860
$ \epsilon = 0.007 $ 0.728 0.838
$ \epsilon = 0.008 $ 0.809 0.768
$ \epsilon = 0.009 $ 0.875 0.890
$ \epsilon = 0.01 $ 0.801 0.853
$ \epsilon = 0.05 $ 0.786 0.794
$ \epsilon = 0.1 $ 0.801 0.816
Table 4.  $\mathtt{ionosphere} $ dataset (altered): Values of the area under ROC curve for different values of $ \epsilon $
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.514 0.514
$ \epsilon = 0.001 $ 0.816 0.840
$ \epsilon = 0.002 $ 0.804 0.835
$ \epsilon = 0.003 $ 0.840 0.814
$ \epsilon = 0.004 $ 0.824 0.830
$ \epsilon = 0.005 $ 0.815 0.829
$ \epsilon = 0.006 $ 0.834 0.829
$ \epsilon = 0.007 $ 0.821 0.815
$ \epsilon = 0.008 $ 0.835 0.815
$ \epsilon = 0.009 $ 0.823 0.822
$ \epsilon = 0.01 $ 0.828 0.835
$ \epsilon = 0.05 $ 0.815 0.826
$ \epsilon = 0.1 $ 0.824 0.823
Value of $ \epsilon $ AUC with KL AUC with Wasserstein
$ \epsilon = 0 $ (LR) 0.514 0.514
$ \epsilon = 0.001 $ 0.816 0.840
$ \epsilon = 0.002 $ 0.804 0.835
$ \epsilon = 0.003 $ 0.840 0.814
$ \epsilon = 0.004 $ 0.824 0.830
$ \epsilon = 0.005 $ 0.815 0.829
$ \epsilon = 0.006 $ 0.834 0.829
$ \epsilon = 0.007 $ 0.821 0.815
$ \epsilon = 0.008 $ 0.835 0.815
$ \epsilon = 0.009 $ 0.823 0.822
$ \epsilon = 0.01 $ 0.828 0.835
$ \epsilon = 0.05 $ 0.815 0.826
$ \epsilon = 0.1 $ 0.824 0.823
[1]

Jutamas Kerdkaew, Rabian Wangkeeree. Characterizing robust weak sharp solution sets of convex optimization problems with uncertainty. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-23. doi: 10.3934/jimo.2019074

[2]

Liliana Trejo-Valencia, Edgardo Ugalde. Projective distance and $g$-measures. Discrete & Continuous Dynamical Systems - B, 2015, 20 (10) : 3565-3579. doi: 10.3934/dcdsb.2015.20.3565

[3]

Han Yang, Jia Yue, Nan-jing Huang. Multi-objective robust cross-market mixed portfolio optimization under hierarchical risk integration. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-17. doi: 10.3934/jimo.2018177

[4]

Nithirat Sisarat, Rabian Wangkeeree, Gue Myung Lee. Some characterizations of robust solution sets for uncertain convex optimization problems with locally Lipschitz inequality constraints. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-25. doi: 10.3934/jimo.2018163

[5]

Mikhail Langovoy, Akhilesh Gotmare, Martin Jaggi. Unsupervised robust nonparametric learning of hidden community properties. Mathematical Foundations of Computing, 2019, 2 (2) : 127-147. doi: 10.3934/mfc.2019010

[6]

Oliver Jenkinson. Optimization and majorization of invariant measures. Electronic Research Announcements, 2007, 13: 1-12.

[7]

Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031

[8]

Mingbao Cheng, Shuxian Xiao, Guosheng Liu. Single-machine rescheduling problems with learning effect under disruptions. Journal of Industrial & Management Optimization, 2018, 14 (3) : 967-980. doi: 10.3934/jimo.2017085

[9]

Murat Adivar, Shu-Cherng Fang. Convex optimization on mixed domains. Journal of Industrial & Management Optimization, 2012, 8 (1) : 189-227. doi: 10.3934/jimo.2012.8.189

[10]

Raz Kupferman, Asaf Shachar. On strain measures and the geodesic distance to $SO_n$ in the general linear group. Journal of Geometric Mechanics, 2016, 8 (4) : 437-460. doi: 10.3934/jgm.2016015

[11]

Xi Chen, Zongrun Wang, Songhai Deng, Yong Fang. Risk measure optimization: Perceived risk and overconfidence of structured product investors. Journal of Industrial & Management Optimization, 2019, 15 (3) : 1473-1492. doi: 10.3934/jimo.2018105

[12]

Yufei Sun, Grace Aw, Kok Lay Teo, Guanglu Zhou. Portfolio optimization using a new probabilistic risk measure. Journal of Industrial & Management Optimization, 2015, 11 (4) : 1275-1283. doi: 10.3934/jimo.2015.11.1275

[13]

Krerley Oliveira, Marcelo Viana. Existence and uniqueness of maximizing measures for robust classes of local diffeomorphisms. Discrete & Continuous Dynamical Systems - A, 2006, 15 (1) : 225-236. doi: 10.3934/dcds.2006.15.225

[14]

Hui Zhang, Jian-Feng Cai, Lizhi Cheng, Jubo Zhu. Strongly convex programming for exact matrix completion and robust principal component analysis. Inverse Problems & Imaging, 2012, 6 (2) : 357-372. doi: 10.3934/ipi.2012.6.357

[15]

Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu. Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 59-68. doi: 10.3934/bdia.2017008

[16]

Xingong Zhang. Single machine and flowshop scheduling problems with sum-of-processing time based learning phenomenon. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-14. doi: 10.3934/jimo.2018148

[17]

Ping Yan, Ji-Bo Wang, Li-Qiang Zhao. Single-machine bi-criterion scheduling with release times and exponentially time-dependent learning effects. Journal of Industrial & Management Optimization, 2019, 15 (3) : 1117-1131. doi: 10.3934/jimo.2018088

[18]

Cheng-Dar Liou. Optimization analysis of the machine repair problem with multiple vacations and working breakdowns. Journal of Industrial & Management Optimization, 2015, 11 (1) : 83-104. doi: 10.3934/jimo.2015.11.83

[19]

Anulekha Dhara, Aparna Mehra. Conjugate duality for generalized convex optimization problems. Journal of Industrial & Management Optimization, 2007, 3 (3) : 415-427. doi: 10.3934/jimo.2007.3.415

[20]

Igor Griva, Roman A. Polyak. Proximal point nonlinear rescaling method for convex optimization. Numerical Algebra, Control & Optimization, 2011, 1 (2) : 283-299. doi: 10.3934/naco.2011.1.283

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]