2016, 1(1): 111-127. doi: 10.3934/bdia.2016.1.111

Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective

1. 

Institute for Information and System Sciences and Ministry of, Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, Shaanxi, China, China, China, China

Received  May 2015 Revised  August 2015 Published  September 2015

Since being recently raised, curriculum learning (CL) and self-paced learning (SPL) have attracted increasing attention due to its multiple successful applications. While currently the rationality of this learning regime is heuristically inspired by the cognitive principle of humans, there still isn't a sound theory to explain the intrinsic mechanism leading to its effectiveness, especially on some successful attempts on big/noise data. To address this issue, this paper presents some theoretical results for revealing the insights under this learning scheme. Specifically, we first formulate a new learning problem aiming to learn a proper classifier from samples generated from the training distribution which is deviated from the target distribution. Furthermore, we find that the CL/SPL regime provides a feasible solving strategy for this learning problem. Especially, by first introducing high-confidence/easy samples and gradually involving low-confidence/complex ones into learning, the CL/SPL process latently minimizes an upper bound of the expected risk under target distribution, purely using the data from the deviated training distribution. We further construct a new SPL learning algorithm based on random sampling, which better complies with our theory, and substantiate its effectiveness by experiments implemented on synthetic and real data.
Citation: Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective. Big Data & Information Analytics, 2016, 1 (1) : 111-127. doi: 10.3934/bdia.2016.1.111
References:
[1]

S. Basu and J. Christensen, Teaching Classification Boundaries to Humans,, Proceddings of the 27th AAAI Conference on Artificial Intelligence, (2013).

[2]

Y. Bengio, J. Louradour, R. Collobert and J. Westone, Curriculum Learning,, Proceedings of the 26th International Conference on Machine Learning, (2009), 41. doi: 10.1145/1553374.1553380.

[3]

C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines,, ACM Transactions on Intelligent Systems and Technology, 2 (2011), 1.

[4]

X. Chen, A. Shrivastava and A. Gupta, NEIL: Extracting visual knowledge from web data,, Proceedings of the IEEE International Conference on Computer Vision, (2013), 1409. doi: 10.1109/ICCV.2013.178.

[5]

F. Cucker and S. Smale, On the mathematical foundations of learning,, Bull. Amer. Math. Soc., 39 (2002), 1. doi: 10.1090/S0273-0979-01-00923-5.

[6]

F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint,, Cambridge University Press, (2007). doi: 10.1017/CBO9780511618796.

[7]

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm,, Proceedings of the 13th International Conference on Machine Learning, (1996).

[8]

L. Jiang, D. Y. Meng, T. Mitamura and A. Hauptman, Easy samples first: Self-paced reranking for multimedia search,, Proceddings of the ACM International Conference on Multimedia, (2014), 547. doi: 10.1145/2647868.2654918.

[9]

L. Jiang, D. Y. Meng, S. Yu, Z. Z. Lan, S. G. Shan and A. Hauptma, Self-paced Learning with Diversity,, Advances in Nerual Information Processing Systems 27, (2014).

[10]

L. Jiang and D. Y. Meng, Q. Zhao, S. G. Shan and A. Hauptman, Self-paced Curriculum Learning,, Proceddings of the 29th AAAI Conference on Artificial Intelligence, (2015).

[11]

F. Khan, X. Zhu and B. Mutlu, How do Humans Teach: On Curriculum Learning and Teaching Dimension,, Advances in Nerual Information Processing Systems 24, (2011).

[12]

M. Kumar, B. Packer and D. Koller, Self-paced Learning for Latent Variable Models,, Advances in Nerual Information Processing Systems 23, (2010).

[13]

M. Kumar, H. Turki, D. Preston and D. Koller, Learning specfic-class segmentation from diverse data,, Proceedings of the IEEE International Conference on Computer Vision, (2011).

[14]

Y. Lee and K. Grauman, Learning the easy things first: Self-paced visual category discovery,, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2011), 1721. doi: 10.1109/CVPR.2011.5995523.

[15]

T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves and J. Welling, Never-Ending Learning,, Proceddings of the 29th AAAI Conference on Artificial Intelligence, (2015).

[16]

M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning,, The MIT Press, (2012).

[17]

E. Ni and C Ling, Supervised learning with minimal effort,, Advances in Knowledge Discovery and Data Mining, 6119 (2010), 476. doi: 10.1007/978-3-642-13672-6_45.

[18]

J. Supanvcivc and D. Ramana, Self-paced learning for long-term tracking,, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2013).

[19]

Y. Tang, Y. B. Yang and Y. Gao, Self-paced Dictionary Learning for Image Classification,, Proceddings of the ACM International Conference on Multimedia, (2012), 833. doi: 10.1145/2393347.2396324.

[20]

K. Tang, V. Ramanathan, F. Li and D. Koller, Shifting weights: Adapting object detectors from image to video,, Advances in Nerual Information Processing Systems 25, (2012).

[21]

V. Vapnik, Statistical Learning Theory,, Wiley-Interscience, (1998).

[22]

S. Yu, L. Jiang, Z. Mao, X. J. Chang, X. Z. Du, C. Gan, Z. Z. Lan, Z. W. Xu, X. C. Li, Y. Cai, A. Kumar, Y. Miao, L. Martin, N. Wolfe, S. C. Xu, H. Li, M. Lin, Z. G. Ma, Y. Yang, D. Y. Meng, S. G. Shan, P. D. Sahin, S. Burger, F. Metze, R. Singh, B. Raj, T. Mitamura, R. Stern and A. Hauptmann, CMU-Informedia@ TRECVID 2014 Multimedia Event Detection (MED),, TRECVID Video Retrieval Evaluation Workshop, (2014).

[23]

Q. Zhao, D. Y. Meng, L. Jiang, Q. Xie, Z. B. Xu and A. Hauptman, Self-paced Matrix Factorization,, Proceddings of the 29th AAAI Conference on Artificial Intelligence, (2015).

show all references

References:
[1]

S. Basu and J. Christensen, Teaching Classification Boundaries to Humans,, Proceddings of the 27th AAAI Conference on Artificial Intelligence, (2013).

[2]

Y. Bengio, J. Louradour, R. Collobert and J. Westone, Curriculum Learning,, Proceedings of the 26th International Conference on Machine Learning, (2009), 41. doi: 10.1145/1553374.1553380.

[3]

C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines,, ACM Transactions on Intelligent Systems and Technology, 2 (2011), 1.

[4]

X. Chen, A. Shrivastava and A. Gupta, NEIL: Extracting visual knowledge from web data,, Proceedings of the IEEE International Conference on Computer Vision, (2013), 1409. doi: 10.1109/ICCV.2013.178.

[5]

F. Cucker and S. Smale, On the mathematical foundations of learning,, Bull. Amer. Math. Soc., 39 (2002), 1. doi: 10.1090/S0273-0979-01-00923-5.

[6]

F. Cucker and D. X. Zhou, Learning Theory: An Approximation Theory Viewpoint,, Cambridge University Press, (2007). doi: 10.1017/CBO9780511618796.

[7]

Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm,, Proceedings of the 13th International Conference on Machine Learning, (1996).

[8]

L. Jiang, D. Y. Meng, T. Mitamura and A. Hauptman, Easy samples first: Self-paced reranking for multimedia search,, Proceddings of the ACM International Conference on Multimedia, (2014), 547. doi: 10.1145/2647868.2654918.

[9]

L. Jiang, D. Y. Meng, S. Yu, Z. Z. Lan, S. G. Shan and A. Hauptma, Self-paced Learning with Diversity,, Advances in Nerual Information Processing Systems 27, (2014).

[10]

L. Jiang and D. Y. Meng, Q. Zhao, S. G. Shan and A. Hauptman, Self-paced Curriculum Learning,, Proceddings of the 29th AAAI Conference on Artificial Intelligence, (2015).

[11]

F. Khan, X. Zhu and B. Mutlu, How do Humans Teach: On Curriculum Learning and Teaching Dimension,, Advances in Nerual Information Processing Systems 24, (2011).

[12]

M. Kumar, B. Packer and D. Koller, Self-paced Learning for Latent Variable Models,, Advances in Nerual Information Processing Systems 23, (2010).

[13]

M. Kumar, H. Turki, D. Preston and D. Koller, Learning specfic-class segmentation from diverse data,, Proceedings of the IEEE International Conference on Computer Vision, (2011).

[14]

Y. Lee and K. Grauman, Learning the easy things first: Self-paced visual category discovery,, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2011), 1721. doi: 10.1109/CVPR.2011.5995523.

[15]

T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves and J. Welling, Never-Ending Learning,, Proceddings of the 29th AAAI Conference on Artificial Intelligence, (2015).

[16]

M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning,, The MIT Press, (2012).

[17]

E. Ni and C Ling, Supervised learning with minimal effort,, Advances in Knowledge Discovery and Data Mining, 6119 (2010), 476. doi: 10.1007/978-3-642-13672-6_45.

[18]

J. Supanvcivc and D. Ramana, Self-paced learning for long-term tracking,, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2013).

[19]

Y. Tang, Y. B. Yang and Y. Gao, Self-paced Dictionary Learning for Image Classification,, Proceddings of the ACM International Conference on Multimedia, (2012), 833. doi: 10.1145/2393347.2396324.

[20]

K. Tang, V. Ramanathan, F. Li and D. Koller, Shifting weights: Adapting object detectors from image to video,, Advances in Nerual Information Processing Systems 25, (2012).

[21]

V. Vapnik, Statistical Learning Theory,, Wiley-Interscience, (1998).

[22]

S. Yu, L. Jiang, Z. Mao, X. J. Chang, X. Z. Du, C. Gan, Z. Z. Lan, Z. W. Xu, X. C. Li, Y. Cai, A. Kumar, Y. Miao, L. Martin, N. Wolfe, S. C. Xu, H. Li, M. Lin, Z. G. Ma, Y. Yang, D. Y. Meng, S. G. Shan, P. D. Sahin, S. Burger, F. Metze, R. Singh, B. Raj, T. Mitamura, R. Stern and A. Hauptmann, CMU-Informedia@ TRECVID 2014 Multimedia Event Detection (MED),, TRECVID Video Retrieval Evaluation Workshop, (2014).

[23]

Q. Zhao, D. Y. Meng, L. Jiang, Q. Xie, Z. B. Xu and A. Hauptman, Self-paced Matrix Factorization,, Proceddings of the 29th AAAI Conference on Artificial Intelligence, (2015).

[1]

D. Warren, K Najarian. Learning theory applied to Sigmoid network classification of protein biological function using primary protein structure. Conference Publications, 2003, 2003 (Special) : 898-904. doi: 10.3934/proc.2003.2003.898

[2]

G. Calafiore, M.C. Campi. A learning theory approach to the construction of predictor models. Conference Publications, 2003, 2003 (Special) : 156-166. doi: 10.3934/proc.2003.2003.156

[3]

Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete & Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17

[4]

Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics & Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117

[5]

Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems & Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901

[6]

Minlong Lin, Ke Tang. Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data & Information Analytics, 2017, 2 (1) : 1-21. doi: 10.3934/bdia.2017005

[7]

Yang Wang, Zhengfang Zhou. Source extraction in audio via background learning. Inverse Problems & Imaging, 2013, 7 (1) : 283-290. doi: 10.3934/ipi.2013.7.283

[8]

Wei Xue, Wensheng Zhang, Gaohang Yu. Least absolute deviations learning of multiple tasks. Journal of Industrial & Management Optimization, 2018, 14 (2) : 719-729. doi: 10.3934/jimo.2017071

[9]

Michael K. Ng, Chi-Pan Tam, Fan Wang. Multi-view foreground segmentation via fourth order tensor learning. Inverse Problems & Imaging, 2013, 7 (3) : 885-906. doi: 10.3934/ipi.2013.7.885

[10]

Marcello Delitala, Tommaso Lorenzi. Recognition and learning in a mathematical model for immune response against cancer. Discrete & Continuous Dynamical Systems - B, 2013, 18 (4) : 891-914. doi: 10.3934/dcdsb.2013.18.891

[11]

Carlos Castillo-Garsow. The role of multiple modeling perspectives in students' learning of exponential growth. Mathematical Biosciences & Engineering, 2013, 10 (5&6) : 1437-1453. doi: 10.3934/mbe.2013.10.1437

[12]

Aude Hofleitner, Tarek Rabbani, Mohammad Rafiee, Laurent El Ghaoui, Alex Bayen. Learning and estimation applications of an online homotopy algorithm for a generalization of the LASSO. Discrete & Continuous Dynamical Systems - S, 2014, 7 (3) : 503-523. doi: 10.3934/dcdss.2014.7.503

[13]

Sheng Xu, Lieyun Ding. Simulation of the effects of different skill learning pathways in heterogeneous construction crews. Journal of Industrial & Management Optimization, 2015, 11 (2) : 381-397. doi: 10.3934/jimo.2015.11.381

[14]

Roberto C. Alamino, Nestor Caticha. Bayesian online algorithms for learning in discrete hidden Markov models. Discrete & Continuous Dynamical Systems - B, 2008, 9 (1) : 1-10. doi: 10.3934/dcdsb.2008.9.1

[15]

Ta-Wei Hung, Ping-Ting Chen. On the optimal replenishment in a finite planning horizon with learning effect of setup costs. Journal of Industrial & Management Optimization, 2010, 6 (2) : 425-433. doi: 10.3934/jimo.2010.6.425

[16]

Mingbao Cheng, Shuxian Xiao, Guosheng Liu. Single-machine rescheduling problems with learning effect under disruptions. Journal of Industrial & Management Optimization, 2017, 13 (4) : 1-14. doi: 10.3934/jimo.2017085

[17]

A. Mittal, N. Hemachandra. Learning algorithms for finite horizon constrained Markov decision processes. Journal of Industrial & Management Optimization, 2007, 3 (3) : 429-444. doi: 10.3934/jimo.2007.3.429

[18]

Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan. Multiple-instance learning for text categorization based on semantic representation. Big Data & Information Analytics, 2017, 2 (1) : 69-75. doi: 10.3934/bdia.2017009

[19]

Jian Mao, Qixiao Lin, Jingdong Bian. Application of learning algorithms in smart home IoT system security. Mathematical Foundations of Computing, 2018, 1 (1) : 63-76. doi: 10.3934/mfc.2018004

[20]

Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031

 Impact Factor: 

Metrics

  • PDF downloads (1)
  • HTML views (0)
  • Cited by (0)

Other articles
by authors

[Back to Top]