
Previous Article
On a twophase approximate greatest descent method for nonlinear optimization with equality constraints
 NACO Home
 This Issue

Next Article
Homotopy perturbation method and Chebyshev polynomials for solving a class of singular and hypersingular integral equations
Approximate greatest descent in neural network optimization
1.  Faculty of Engineering and Science, Curtin University Malaysia, Malaysia 
2.  Department of Aerospace and Software Engineering, Gyeongsang National University, South Korea 
Numerical optimization is required in artificial neural network to update weights iteratively for learning capability. In this paper, we propose the use of Approximate Greatest Descent (AGD) algorithm to optimize neural network weights using longterm backpropagation manner. The modification and development of AGD into stochastic diagonal AGD (SDAGD) algorithm could improve the learning ability and structural simplicity for deep learning neural networks. It is derived from the operation of a multistage decision control system which consists of two phases: (1) when local search region does not contain the minimum point, iteration shall be defined at the boundary of the local search region, (2) when local region contains the minimum point, Newton method is approximated for faster convergence. The integration of SDAGD into Multilayered perceptron (MLP) network is investigated with the goal of improving the learning ability and structural simplicity. Simulation results showed that twolayer MLP with SDAGD achieved a misclassification rate of 9.4% on a smaller mixed national institute of national and technology (MNIST) dataset. MNIST is a database equipped with handwritten digits images suitable for algorithm prototyping in artificial neural networks.
References:
[1] 
S. Amari, H. Park and K. Fukumizu, Adaptive method of realizing natural gradient learning for multilayer perceptron, Neural Compt., 12 (2000), 436444. doi: 10.1162/089976600300015420. 
[2] 
S. Becker and Y. LeCun, Improving the convergence of backpropagation learning with second order methods, Proc. of the Con. Models Summer School, (1988), 2937. 
[3] 
Y. Bengio, Learning deep architectures for AI, Foundations and trends in Machine Learning, 2 (2009), 1127. 
[4] 
L. Bottou, Largescale machine learning with stochastic gradient descent, Proc. of COMPSTAT, (2010), 177186. 
[5] 
X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Aistats, 9 (2010), 249256. 
[6] 
B. S. Goh, Greatest descent algorithms in unconstrained optimization, J. Optim. Theory Appl., 142 (2009), 275289. doi: 10.1007/s1095700995334. 
[7] 
B. S. Goh, Numerical method in optimization as a multistage decision control system, Latest Advances in Systems Science and Computational Intelligence, (2012), 2530. 
[8] 
Y. LeCun, L. Bottou, G. B. Orr and K. R. Müller, Efficient backprop, Neural Networks: Tricks of the Trade, Springer, (2012), 9–48. 
[9] 
Y. LeCun, Y. Bengio and G. Hinton, Deep learning, Nature, 521 (2015), 436444. doi: 10.1038/nature14539. 
[10] 
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradientbased learning applied to document recognition, Proc. IEEE, 86 (1998), 22782323. doi: 10.1109/5.726791. 
[11] 
K. H. Lim, K. P. Seng, L. M. Ang and S. W. Chin, Lyapunov theorybased multilayered neural network, IEEE Transactions on Circuits and Systems II: Express Briefs, 4 (2009), 305309. 
[12] 
J. Nocedal and S. Wright, Numerical Optimization, 2nd ed., Springer, 2006. 
[13] 
J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Tech. Rep. C. CarnegieMellon Univ., 1994. 
[14] 
J. SohlDickstein, B. Poole and S. Ganguli, Fast largescale optimization by unifying stochastic gradient and quasiNewton methods, Procs. 31st Int. Conf. Mach. Learn, (2014), 604612. 
[15] 
D. Stutz, Introduction to Neural Networks, Selected Topics in Human Language Technology and Pattern Recognition WS 12/14, 2014. 
[16] 
H. H. Tan, K. H. Lim and H. G. Harno, Stochastic diagonal approximate greatest descent in neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), (2007), 18951898. doi: 10.1109/IJCNN.2017.7966081. 
show all references
References:
[1] 
S. Amari, H. Park and K. Fukumizu, Adaptive method of realizing natural gradient learning for multilayer perceptron, Neural Compt., 12 (2000), 436444. doi: 10.1162/089976600300015420. 
[2] 
S. Becker and Y. LeCun, Improving the convergence of backpropagation learning with second order methods, Proc. of the Con. Models Summer School, (1988), 2937. 
[3] 
Y. Bengio, Learning deep architectures for AI, Foundations and trends in Machine Learning, 2 (2009), 1127. 
[4] 
L. Bottou, Largescale machine learning with stochastic gradient descent, Proc. of COMPSTAT, (2010), 177186. 
[5] 
X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Aistats, 9 (2010), 249256. 
[6] 
B. S. Goh, Greatest descent algorithms in unconstrained optimization, J. Optim. Theory Appl., 142 (2009), 275289. doi: 10.1007/s1095700995334. 
[7] 
B. S. Goh, Numerical method in optimization as a multistage decision control system, Latest Advances in Systems Science and Computational Intelligence, (2012), 2530. 
[8] 
Y. LeCun, L. Bottou, G. B. Orr and K. R. Müller, Efficient backprop, Neural Networks: Tricks of the Trade, Springer, (2012), 9–48. 
[9] 
Y. LeCun, Y. Bengio and G. Hinton, Deep learning, Nature, 521 (2015), 436444. doi: 10.1038/nature14539. 
[10] 
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradientbased learning applied to document recognition, Proc. IEEE, 86 (1998), 22782323. doi: 10.1109/5.726791. 
[11] 
K. H. Lim, K. P. Seng, L. M. Ang and S. W. Chin, Lyapunov theorybased multilayered neural network, IEEE Transactions on Circuits and Systems II: Express Briefs, 4 (2009), 305309. 
[12] 
J. Nocedal and S. Wright, Numerical Optimization, 2nd ed., Springer, 2006. 
[13] 
J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain, Tech. Rep. C. CarnegieMellon Univ., 1994. 
[14] 
J. SohlDickstein, B. Poole and S. Ganguli, Fast largescale optimization by unifying stochastic gradient and quasiNewton methods, Procs. 31st Int. Conf. Mach. Learn, (2014), 604612. 
[15] 
D. Stutz, Introduction to Neural Networks, Selected Topics in Human Language Technology and Pattern Recognition WS 12/14, 2014. 
[16] 
H. H. Tan, K. H. Lim and H. G. Harno, Stochastic diagonal approximate greatest descent in neural networks, 2017 International Joint Conference on Neural Networks (IJCNN), (2007), 18951898. doi: 10.1109/IJCNN.2017.7966081. 
Training Algorithm  Training MCR (%)  Testing MCR (%)  MSE 
SGD  8.22  12.14  0.40 
SDLM  8.86  10.19  0.32 
SDAGD  6.46  9.40  0.21 
Training Algorithm  Training MCR (%)  Testing MCR (%)  MSE 
SGD  8.22  12.14  0.40 
SDLM  8.86  10.19  0.32 
SDAGD  6.46  9.40  0.21 
[1] 
M. S. Lee, B. S. Goh, H. G. Harno, K. H. Lim. On a twophase approximate greatest descent method for nonlinear optimization with equality constraints. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 315326. doi: 10.3934/naco.2018020 
[2] 
Theodore Tachim Medjo. A twophase flow model with delays. Discrete & Continuous Dynamical Systems  B, 2017, 22 (9) : 32733294. doi: 10.3934/dcdsb.2017137 
[3] 
Jan Prüss, Jürgen Saal, Gieri Simonett. Singular limits for the twophase Stefan problem. Discrete & Continuous Dynamical Systems  A, 2013, 33 (11&12) : 53795405. doi: 10.3934/dcds.2013.33.5379 
[4] 
Marianne Korten, Charles N. Moore. Regularity for solutions of the twophase Stefan problem. Communications on Pure & Applied Analysis, 2008, 7 (3) : 591600. doi: 10.3934/cpaa.2008.7.591 
[5] 
CaiTong Yue, Jing Liang, BoFei Lang, BoYang Qu. Twohiddenlayer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 5968. doi: 10.3934/bdia.2017008 
[6] 
T. Tachim Medjo. Averaging of an homogeneous twophase flow model with oscillating external forces. Discrete & Continuous Dynamical Systems  A, 2012, 32 (10) : 36653690. doi: 10.3934/dcds.2012.32.3665 
[7] 
Eberhard Bänsch, Steffen Basting, Rolf Krahl. Numerical simulation of twophase flows with heat and mass transfer. Discrete & Continuous Dynamical Systems  A, 2015, 35 (6) : 23252347. doi: 10.3934/dcds.2015.35.2325 
[8] 
Ciprian G. Gal, Maurizio Grasselli. Longtime behavior for a model of homogeneous incompressible twophase flows. Discrete & Continuous Dynamical Systems  A, 2010, 28 (1) : 139. doi: 10.3934/dcds.2010.28.1 
[9] 
Jie Jiang, Yinghua Li, Chun Liu. Twophase incompressible flows with variable density: An energetic variational approach. Discrete & Continuous Dynamical Systems  A, 2017, 37 (6) : 32433284. doi: 10.3934/dcds.2017138 
[10] 
V. S. Manoranjan, HongMing Yin, R. Showalter. On twophase Stefan problem arising from a microwave heating process. Discrete & Continuous Dynamical Systems  A, 2006, 15 (4) : 11551168. doi: 10.3934/dcds.2006.15.1155 
[11] 
Feng Ma, Mingfang Ni. A twophase method for multidimensional number partitioning problem. Numerical Algebra, Control & Optimization, 2013, 3 (2) : 203206. doi: 10.3934/naco.2013.3.203 
[12] 
Theodore TachimMedjo. Optimal control of a twophase flow model with state constraints. Mathematical Control & Related Fields, 2016, 6 (2) : 335362. doi: 10.3934/mcrf.2016006 
[13] 
Yasuhito Miyamoto. Global bifurcation and stable twophase separation for a phase field model in a disk. Discrete & Continuous Dynamical Systems  A, 2011, 30 (3) : 791806. doi: 10.3934/dcds.2011.30.791 
[14] 
Jan Prüss, Yoshihiro Shibata, Senjo Shimizu, Gieri Simonett. On wellposedness of incompressible twophase flows with phase transitions: The case of equal densities. Evolution Equations & Control Theory, 2012, 1 (1) : 171194. doi: 10.3934/eect.2012.1.171 
[15] 
Fengqiu Liu, Xiaoping Xue. Subgradientbased neural network for nonconvex optimization problems in support vector machines with indefinite kernels. Journal of Industrial & Management Optimization, 2016, 12 (1) : 285301. doi: 10.3934/jimo.2016.12.285 
[16] 
Barbara Lee Keyfitz, Richard Sanders, Michael Sever. Lack of hyperbolicity in the twofluid model for twophase incompressible flow. Discrete & Continuous Dynamical Systems  B, 2003, 3 (4) : 541563. doi: 10.3934/dcdsb.2003.3.541 
[17] 
K. Domelevo. Wellposedness of a kinetic model of dispersed twophase flow with pointparticles and stability of travelling waves. Discrete & Continuous Dynamical Systems  B, 2002, 2 (4) : 591607. doi: 10.3934/dcdsb.2002.2.591 
[18] 
Guochun Wu, Yinghui Zhang. Global analysis of strong solutions for the viscous liquidgas twophase flow model in a bounded domain. Discrete & Continuous Dynamical Systems  B, 2018, 23 (4) : 14111429. doi: 10.3934/dcdsb.2018157 
[19] 
Helmut Abels, Harald Garcke, Josef Weber. Existence of weak solutions for a diffuse interface model for twophase flow with surfactants. Communications on Pure & Applied Analysis, 2019, 18 (1) : 195225. doi: 10.3934/cpaa.2019011 
[20] 
Kenta Ohi, Tatsuo Iguchi. A twophase problem for capillarygravity waves and the BenjaminOno equation. Discrete & Continuous Dynamical Systems  A, 2009, 23 (4) : 12051240. doi: 10.3934/dcds.2009.23.1205 
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]