April  2018, 14(2): 719-729. doi: 10.3934/jimo.2017071

Least absolute deviations learning of multiple tasks

1. 

School of Computer Science and Technology, Anhui University of Technology, Maanshan 243032, China

2. 

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

3. 

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

4. 

School of Mathematics and Computer Sciences, Gannan Normal University, Ganzhou 341000, China

Received  December 2015 Published  September 2017

Fund Project: The initial version of this work was done while the first author was a Ph.D. candidate at Nanjing University of Science and Technology

In this paper, we propose a new multitask feature selection model based on least absolute deviations. However, due to the inherent nonsmoothness of $l_1 $ norm, optimizing this model is challenging. To tackle this problem efficiently, we introduce an alternating iterative optimization algorithm. Moreover, under some mild conditions, its global convergence result could be established. Experimental results and comparison with the state-of-the-art algorithm SLEP show the efficiency and effectiveness of the proposed approach in solving multitask learning problems.

Citation: Wei Xue, Wensheng Zhang, Gaohang Yu. Least absolute deviations learning of multiple tasks. Journal of Industrial & Management Optimization, 2018, 14 (2) : 719-729. doi: 10.3934/jimo.2017071
References:
[1]

A. ArgyriouT. Evgeniou and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272. Google Scholar

[2]

Y. Bai and M. Tang, Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913. Google Scholar

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99. Google Scholar

[4]

S. Ben-David and R. Schuller, Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580. doi: 10.1007/978-3-540-45167-9_41. Google Scholar

[5]

K. P. BennettJ. HuX. JiG. Kunapuli and J.-S. Pang, Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929. Google Scholar

[6]

X. ChenW. PanJ. T. Kwok and J. G. Carbonell, Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751. doi: 10.1109/ICDM.2009.128. Google Scholar

[7]

T. EvgeniouC. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637. Google Scholar

[8]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40. doi: 10.1016/0898-1221(76)90003-1. Google Scholar

[9]

P. GongJ. Ye and C. Zhang, Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996. Google Scholar

[10]

P. GongJ. Ye and C. Zhang, Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010. Google Scholar

[11]

B. HeL. LiaoD. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118. doi: 10.1007/s101070100280. Google Scholar

[12]

Y. HuZ. Wei and G. Yuan, Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367. doi: 10.19139/106. Google Scholar

[13]

M. KowalskiM. Szafranski and L. Ralaivola, Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552. doi: 10.1145/1553374.1553445. Google Scholar

[14]

J. LiuS. Ji and J. Ye, Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348. Google Scholar

[15]

J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP.Google Scholar

[16]

Y. LuY.-E. Ge and L-W. Zhang, An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336. doi: 10.3934/jimo.2016.12.317. Google Scholar

[17]

F. NieH. HuangX. Cai and C. Ding, Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821. Google Scholar

[18]

G. ObozinskiB. Taskar and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252. doi: 10.1007/s11222-008-9111-x. Google Scholar

[19]

A. QuattoniX. CarrerasM. Collins and T. Darrell, An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864. Google Scholar

[20]

Z. ShenZ. Geng and J. Yang, Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14. doi: 10.19139/121. Google Scholar

[21]

S. XiangF. NieG. MengC. Pan and C. Zhang, Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754. Google Scholar

[22]

Y. XiaoS.-Y. Wu and B. He, A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069. doi: 10.3934/jimo.2012.8.1057. Google Scholar

[23]

T. XiongJ. BiB. Rao and V. Cherkassky, Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342. doi: 10.1137/1.9781611972771.30. Google Scholar

[24]

W. Xue and W. Zhang, Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450. doi: 10.1109/TNNLS.2016.2514413. Google Scholar

[25]

W. Xue and W. Zhang, Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203. doi: 10.1007/978-3-319-46672-9_23. Google Scholar

[26]

H. YangM. R. Lyu and I. King, Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27. Google Scholar

[27]

L. YangT. K. Pong and X. Chen, Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506. doi: 10.1137/15M1027528. Google Scholar

[28]

G. YuW. Xue and Y. Zhou, A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249. doi: 10.1016/j.sigpro.2014.02.025. Google Scholar

show all references

References:
[1]

A. ArgyriouT. Evgeniou and M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243-272. Google Scholar

[2]

Y. Bai and M. Tang, Object tracking via robust multitask sparse representation, IEEE Signal Process. Lett., 21 (2014), 909-913. Google Scholar

[3]

B. Bakker and T. Heskes, Task clustering and gating for Bayesian multitask learning, J. Mach. Learn. Res., 4 (2003), 83-99. Google Scholar

[4]

S. Ben-David and R. Schuller, Exploiting task relatedness for multiple task learning, in Proc. Int. Conf. Learn. Theory, 2777 (2003), 567-580. doi: 10.1007/978-3-540-45167-9_41. Google Scholar

[5]

K. P. BennettJ. HuX. JiG. Kunapuli and J.-S. Pang, Model selection via bilevel optimization in Proc, IEEE Int. Joint Conf. Neural Netw., (2006), 1922-1929. Google Scholar

[6]

X. ChenW. PanJ. T. Kwok and J. G. Carbonell, Accelerated gradient method for multi-task sparse learning problem in Proc, IEEE Int. Conf. Data Min., (2009), 746-751. doi: 10.1109/ICDM.2009.128. Google Scholar

[7]

T. EvgeniouC. A. Micchelli and M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615-637. Google Scholar

[8]

D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Comp. Math. Appl., 2 (1976), 17-40. doi: 10.1016/0898-1221(76)90003-1. Google Scholar

[9]

P. GongJ. Ye and C. Zhang, Multi-stage multi-task feature learning in Adv, Neural Inf. Process. Syst., (2012), 1988-1996. Google Scholar

[10]

P. GongJ. Ye and C. Zhang, Robust multi-task feature learning, J. Mach. Learn. Res., 14 (2013), 2979-3010. Google Scholar

[11]

B. HeL. LiaoD. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Math. Program., 92 (2002), 103-118. doi: 10.1007/s101070100280. Google Scholar

[12]

Y. HuZ. Wei and G. Yuan, Inexact accelerated proximal gradient algorithms for matrix $l_{2, 1}$-norm minimization problem in multi-task feature learning, Stat., Optim. Inf. Comput., 2 (2014), 352-367. doi: 10.19139/106. Google Scholar

[13]

M. KowalskiM. Szafranski and L. Ralaivola, Multiple indefinite kernel learning with mixed norm regularization, in Proc. Int. Conf. Mach. Learn., (2009), 545-552. doi: 10.1145/1553374.1553445. Google Scholar

[14]

J. LiuS. Ji and J. Ye, Multi-task feature learning via efficient $l_{2,1} $-norm minimization in Proc, Uncertainity Artif. Intell., (2009), 339-348. Google Scholar

[15]

J. Liu, S. Ji and J. Ye, SLEP: Sparse learning with efficient projections, Available from: http://yelab.net/software/SLEP.Google Scholar

[16]

Y. LuY.-E. Ge and L-W. Zhang, An alternating direction method for solving a class of inverse semidefinite quadratic programming problems, J. Ind. Manag. Optim., 12 (2016), 317-336. doi: 10.3934/jimo.2016.12.317. Google Scholar

[17]

F. NieH. HuangX. Cai and C. Ding, Efficient and robust feature selection via joint $l_{2,1} $-norms minimization in Adv, Neural Inf. Process. Syst., (2010), 1813-1821. Google Scholar

[18]

G. ObozinskiB. Taskar and M. I. Jordan, Joint covariate selection and joint subspace selection for multiple classification problems, Statist. Comput., 20 (2010), 231-252. doi: 10.1007/s11222-008-9111-x. Google Scholar

[19]

A. QuattoniX. CarrerasM. Collins and T. Darrell, An efficient projection for $l_{1,∞} $ regularization in Proc, Int. Conf. Mach. Learn., (2009), 857-864. Google Scholar

[20]

Z. ShenZ. Geng and J. Yang, Image reconstruction from incomplete convolution data via total variation regularization, Stat., Optim. Inf. Comput., 3 (2015), 1-14. doi: 10.19139/121. Google Scholar

[21]

S. XiangF. NieG. MengC. Pan and C. Zhang, Discriminative least squares regression for multiclass classification and feature selection, IEEE Trans. Neural Netw. Learn. Syst., 23 (2012), 1738-1754. Google Scholar

[22]

Y. XiaoS.-Y. Wu and B. He, A proximal alternating direction method for $l_{2,1} $-norm least squares problem in multi-task feature learning, J. Ind. Manag. Optim., 8 (2012), 1057-1069. doi: 10.3934/jimo.2012.8.1057. Google Scholar

[23]

T. XiongJ. BiB. Rao and V. Cherkassky, Probabilistic joint feature selection for multi-task learning in Proc, SIAM Int. Conf. Data Min., (2007), 332-342. doi: 10.1137/1.9781611972771.30. Google Scholar

[24]

W. Xue and W. Zhang, Learning a coupled linearized method in online setting, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), 438-450. doi: 10.1109/TNNLS.2016.2514413. Google Scholar

[25]

W. Xue and W. Zhang, Online weighted multi-task feature selection in Proc, Int. Conf. Neural Inf. Process., (2016), 195-203. doi: 10.1007/978-3-319-46672-9_23. Google Scholar

[26]

H. YangM. R. Lyu and I. King, Efficient online learning for multitask feature selection, ACM Trans. Knowl. Discov. Data, 7 (2013), 1-27. Google Scholar

[27]

L. YangT. K. Pong and X. Chen, Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction, SIAM J. Imaging Sci., 10 (2017), 74-110, arXiv:1506. doi: 10.1137/15M1027528. Google Scholar

[28]

G. YuW. Xue and Y. Zhou, A nonmonotone adaptive projected gradient method for primal-dual total variation image restoration, Signal Process., 103 (2014), 242-249. doi: 10.1016/j.sigpro.2014.02.025. Google Scholar

Figure 1.  The first row shows the convergence results of SLEP and LADL21 when $Cov=diag\{1, 0.25, 0.1, 0.05, 0.01\}$. The second row shows the convergence results of SLEP and LADL21 when $Cov=diag\{0.81, 0.64, 0.49, 0.36, 0.25, 0.16, 0.09, 0.04\}$
Figure 2.  Comparison results of SLEP and LADL21 on the School data
Input: $\mathbf{r}_0$, $\boldsymbol{\lambda}_0$, $\mu$, $\beta$, and $\tau$.
1while "not converged", do
2Compute $W_{k+1}$ according to (15) for given $\mathbf{r}_k$ and $\boldsymbol{\lambda}_k$.
3Compute $\mathbf{r}_{k+1}$ according to (17) for given $W_{k+1}$ and $\boldsymbol{\lambda}_k$.
4Update the multipliers $\boldsymbol{\lambda}_{k+1}$ according to (18).
5end while
 Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5).
Input: $\mathbf{r}_0$, $\boldsymbol{\lambda}_0$, $\mu$, $\beta$, and $\tau$.
1while "not converged", do
2Compute $W_{k+1}$ according to (15) for given $\mathbf{r}_k$ and $\boldsymbol{\lambda}_k$.
3Compute $\mathbf{r}_{k+1}$ according to (17) for given $W_{k+1}$ and $\boldsymbol{\lambda}_k$.
4Update the multipliers $\boldsymbol{\lambda}_{k+1}$ according to (18).
5end while
 Algorithm 1:LADL21-An efficient iterative algorithm to solve the optimization problem in (5).
Table 1.  Comparison of the RelErr
(m, n, k)SLEPLADL21
(10000, 20,100)0.01390.0025
(20000, 20,200)0.01360.0017
(30000, 20,300)0.01400.0015
(40000, 20,400)0.01380.0016
(10000, 30,100)0.01410.0016
(20000, 30,200)0.01990.0016
(30000, 30,300)0.01430.0017
(40000, 30,400)0.02020.0018
(10000, 40,100)0.01870.0018
(20000, 40,200)0.01460.0018
(30000, 40,300)0.01540.0020
(40000, 40,400)0.01880.0021
(10000, 50,100)0.01700.0020
(20000, 50,200)0.01790.0022
(30000, 50,300)0.01770.0022
(40000, 50,400)0.01760.0024
(m, n, k)SLEPLADL21
(10000, 20,100)0.01390.0025
(20000, 20,200)0.01360.0017
(30000, 20,300)0.01400.0015
(40000, 20,400)0.01380.0016
(10000, 30,100)0.01410.0016
(20000, 30,200)0.01990.0016
(30000, 30,300)0.01430.0017
(40000, 30,400)0.02020.0018
(10000, 40,100)0.01870.0018
(20000, 40,200)0.01460.0018
(30000, 40,300)0.01540.0020
(40000, 40,400)0.01880.0021
(10000, 50,100)0.01700.0020
(20000, 50,200)0.01790.0022
(30000, 50,300)0.01770.0022
(40000, 50,400)0.01760.0024
[1]

Yunhai Xiao, Soon-Yi Wu, Bing-Sheng He. A proximal alternating direction method for $\ell_{2,1}$-norm least squares problem in multi-task feature learning. Journal of Industrial & Management Optimization, 2012, 8 (4) : 1057-1069. doi: 10.3934/jimo.2012.8.1057

[2]

Russell E. Warren, Stanley J. Osher. Hyperspectral unmixing by the alternating direction method of multipliers. Inverse Problems & Imaging, 2015, 9 (3) : 917-933. doi: 10.3934/ipi.2015.9.917

[3]

Duo Wang, Zheng-Fen Jin, Youlin Shang. A penalty decomposition method for nuclear norm minimization with l1 norm fidelity term. Evolution Equations & Control Theory, 2019, 8 (4) : 695-708. doi: 10.3934/eect.2019034

[4]

Foxiang Liu, Lingling Xu, Yuehong Sun, Deren Han. A proximal alternating direction method for multi-block coupled convex optimization. Journal of Industrial & Management Optimization, 2019, 15 (2) : 723-737. doi: 10.3934/jimo.2018067

[5]

Sohana Jahan. Supervised distance preserving projection using alternating direction method of multipliers. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-17. doi: 10.3934/jimo.2019029

[6]

Feishe Chen, Lixin Shen, Yuesheng Xu, Xueying Zeng. The Moreau envelope approach for the L1/TV image denoising model. Inverse Problems & Imaging, 2014, 8 (1) : 53-77. doi: 10.3934/ipi.2014.8.53

[7]

Linghua Chen, Espen R. Jakobsen. L1 semigroup generation for Fokker-Planck operators associated to general Lévy driven SDEs. Discrete & Continuous Dynamical Systems - A, 2018, 38 (11) : 5735-5763. doi: 10.3934/dcds.2018250

[8]

Bingsheng He, Xiaoming Yuan. Linearized alternating direction method of multipliers with Gaussian back substitution for separable convex programming. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 247-260. doi: 10.3934/naco.2013.3.247

[9]

Yue Lu, Ying-En Ge, Li-Wei Zhang. An alternating direction method for solving a class of inverse semi-definite quadratic programming problems. Journal of Industrial & Management Optimization, 2016, 12 (1) : 317-336. doi: 10.3934/jimo.2016.12.317

[10]

Zhongming Wu, Xingju Cai, Deren Han. Linearized block-wise alternating direction method of multipliers for multiple-block convex programming. Journal of Industrial & Management Optimization, 2018, 14 (3) : 833-855. doi: 10.3934/jimo.2017078

[11]

Yuan Shen, Lei Ji. Partial convolution for total variation deblurring and denoising by new linearized alternating direction method of multipliers with extension step. Journal of Industrial & Management Optimization, 2019, 15 (1) : 159-175. doi: 10.3934/jimo.2018037

[12]

Yingying Li, Stanley Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Problems & Imaging, 2009, 3 (3) : 487-503. doi: 10.3934/ipi.2009.3.487

[13]

Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2017, 2 (5) : 1-8. doi: 10.3934/bdia.2017020

[14]

Yunmei Lu, Mingyuan Yan, Meng Han, Qingliang Yang, Yanqing Zhang. Privacy preserving feature selection and Multiclass Classification for horizontally distributed data. Mathematical Foundations of Computing, 2018, 1 (4) : 331-348. doi: 10.3934/mfc.2018016

[15]

Peng Zhang. Chance-constrained multiperiod mean absolute deviation uncertain portfolio selection. Journal of Industrial & Management Optimization, 2019, 15 (2) : 537-564. doi: 10.3934/jimo.2018056

[16]

Zhifeng Dai, Huan Zhu, Fenghua Wen. Two nonparametric approaches to mean absolute deviation portfolio selection model. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-21. doi: 10.3934/jimo.2019054

[17]

Lei Wu, Zhe Sun. A new spectral method for $l_1$-regularized minimization. Inverse Problems & Imaging, 2015, 9 (1) : 257-272. doi: 10.3934/ipi.2015.9.257

[18]

Yanqing Liu, Jiyuan Tao, Huan Zhang, Xianchao Xiu, Lingchen Kong. Fused LASSO penalized least absolute deviation estimator for high dimensional linear regression. Numerical Algebra, Control & Optimization, 2018, 8 (1) : 97-117. doi: 10.3934/naco.2018006

[19]

Mohamed A. Tawhid, Kevin B. Dsouza. Hybrid binary dragonfly enhanced particle swarm optimization algorithm for solving feature selection problems. Mathematical Foundations of Computing, 2018, 1 (2) : 181-200. doi: 10.3934/mfc.2018009

[20]

Peng Zhang. Multiperiod mean semi-absolute deviation interval portfolio selection with entropy constraints. Journal of Industrial & Management Optimization, 2017, 13 (3) : 1169-1187. doi: 10.3934/jimo.2016067

2018 Impact Factor: 1.025

Metrics

  • PDF downloads (32)
  • HTML views (582)
  • Cited by (0)

Other articles
by authors

[Back to Top]