May 2018, 1(2): 149-180. doi: 10.3934/mfc.2018008

How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods

1. 

George Mason University, 4400 University Dr, Fairfax, VA 22030, USA

2. 

Clarkson University, 8 Clarkson Ave, Potsdam, NY 13699, USA

* Corresponding author: Xiang Chen

Received  October 2017 Revised  December 2017 Published  May 2018

Fund Project: The authors are supported by NSF Grant CNS-1717775

Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs' outstanding capability to learn the input features with deep layers of neuron structures and iterative training process. However, these learned features are hard to identify and interpret from a human vision perspective, causing a lack of understanding of the CNNs' internal working mechanism. To improve the CNN interpretability, the CNN visualization is well utilized as a qualitative analysis method, which translates the internal features into visually perceptible patterns. And many CNN visualization works have been proposed in the literature to interpret the CNN in perspectives of network structure, operation, and semantic concept.

In this paper, we expect to provide a comprehensive survey of several representative CNN visualization methods, including Activation Maximization, Network Inversion, Deconvolutional Neural Networks (DeconvNet), and Network Dissection based visualization. These methods are presented in terms of motivations, algorithms, and experiment results. Based on these visualization methods, we also discuss their practical applications to demonstrate the significance of the CNN interpretability in areas of network design, optimization, security enhancement, etc.

Citation: Zhuwei Qin, Fuxun Yu, Chenchen Liu, Xiang Chen. How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods. Mathematical Foundations of Computing, 2018, 1 (2) : 149-180. doi: 10.3934/mfc.2018008
References:
[1]

P. Agrawal, R. Girshick and J. Malik, Analyzing the performance of multilayer neural networks for object recognition, in Proceedings of the European Conference on Computer Vision, 2014, 329-344. doi: 10.1007/978-3-319-10584-0_22.

[2]

M. Arjovsky, S. Chintala and L. Bottou, Wasserstein gan, arXiv preprint, arXiv: 1701.07875.

[3]

D. Bau, B. Zhou, A. Khosla, A. Oliva and A. Torralba, Network dissection: Quantifying interpretability of deep visual representations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 3319-3327. doi: 10.1109/CVPR.2017.354.

[4]

D. C. Ciresan, U. Meier, J. Masci, L. Maria Gambardella and J. Schmidhuber, Flexible, High performance convolutional neural networks for image classification, in Proceedings of the International Joint Conference on Artificial Intelligence, vol. 22, 2011, p1237.

[5]

R. Collobert, K. Kavukcuoglu and C. Farabet, Torch7: A matlab-like environment for machine learning, in Workshop on BigLearn, NIPS, 2011.

[6]

G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, Visual categorization with bags of keypoints, in Workshop on statistical learning in computer vision, ECCV, vol. 1, 2004, 1-2.

[7]

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, 886-893. doi: 10.1109/CVPR.2005.177.

[8]

E. d'Angelo, A. Alahi and P. Vandergheynst, Beyond bits: Reconstructing images from local binary descriptors, in Proceedings of the IEEE Conference on Pattern Recognition, 2012, 935-938.

[9]

E. L. Denton, S. Chintala, R. Fergus et al., Deep generative image models using a Laplacian pyramid of adversarial networks, in Proceedings of the Advances in Neural Information Processing Systems, 2015, 1486-1494.

[10]

A. Dosovitskiy and T. Brox, Generating images with perceptual similarity metrics based on deep networks, in Proceedings of the Advances in Neural Information Processing Systems, 2016, 658-666.

[11]

A. Dosovitskiy and T. Brox, Inverting visual representations with convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 4829-4837. doi: 10.1109/CVPR.2016.522.

[12]

A. Dosovitskiy, J. Tobias Springenberg and T. Brox, Learning to generate chairs with convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 1538-1546. doi: 10.1109/CVPR.2015.7298761.

[13]

J. DuchiE. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 12 (2011), 2121-2159.

[14]

D. ErhanY. BengioA. Courville and P. Vincent, Visualizing higher-layer features of a deep network, Technical report, University of Montreal, (2009), p3.

[15]

P. F. FelzenszwalbR. B. GirshickD. McAllester and D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), 1627-1645. doi: 10.1109/TPAMI.2009.167.

[16]

R. Fong and A. Vedaldi, Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks, arXiv preprint, arXiv: 1801.03454.

[17]

L. A. Gatys, A. S. Ecker and M. Bethge, A neural algorithm of artistic style, Journal of Vision, 16 (2016), p326, arXiv: 1508.06576. doi: 10.1167/16.12.326.

[18]

L. A. Gatys, A. S. Ecker and M. Bethge, Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks, arXiv preprint, arXiv: 1505.07376, 12.

[19]

R. B. Girshick, P. F. Felzenszwalb and D. McAllester, Discriminatively trained deformable part models, release 5, http://people.cs.uchicago.edu/~rbg/latent-release5/.

[20]

R. Girshick, J. Donahue, T. Darrell and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 580-587. doi: 10.1109/CVPR.2014.81.

[21]

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the International Conference on Artificial Intelligence and Statistics, 2010, 249-256.

[22]

Y. Gong, L. Wang, R. Guo and S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, in Proceedings of the European Conference on Computer Vision, 2014, 392-407. doi: 10.1007/978-3-319-10584-0_26.

[23]

A. Gonzalez-GarciaD. Modolo and V. Ferrari, Do semantic parts emerge in convolutional neural networks?, International Journal of Computer Vision, 126 (2018), 476-494. doi: 10.1007/s11263-017-1048-0.

[24]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, in Proceedings of the Advances in Neural Information Processing Systems, 2014, 2672-2680.

[25]

A. Gordo, J. Almazán, J. Revaud and D. Larlus, Deep image retrieval: Learning global representations for image search, in Proceedings of the European Conference on Computer Vision, Springer, 2016, 241-257. doi: 10.1007/978-3-319-46466-4_15.

[26]

S. Han, H. Mao and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint, arXiv: 1510.00149.

[27]

K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, 770-778. doi: 10.1109/CVPR.2016.90.

[28]

G. E. HintonS. Osindero and Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation, 18 (2006), 1527-1554. doi: 10.1162/neco.2006.18.7.1527.

[29]

D. H. Hubel and T. N. Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of Physiology, 195 (1968), 215-243, URL http://dx.doi.org/10.1113/jphysiol.1968.sp008455. doi: 10.1113/jphysiol.1968.sp008455.

[30]

D. H. Hubel and T. N. Wiesel, Receptive fields of single neurones in the cat's striate cortex, The Journal of Physiology, 148 (1959), 574-591. doi: 10.1113/jphysiol.1959.sp006308.

[31]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the International Conference on Machine Learning, 2015, 448-456.

[32]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the International Conference on Machine Learning, 2015, 448-456.

[33]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the International Conference on Multimedia, 2014, 675-678. doi: 10.1145/2647868.2654889.

[34]

G.-S. Kalanit and M. Rafael, The human visual cortex, Annual Review of Neuroscience, 27 (2004), 649-677.

[35]

K. N. KayT. NaselarisR. J. Prenger and J. L. Gallant, Identifying natural images from human brain activity, Nature, 452 (2008), p352. doi: 10.1038/nature06713.

[36]

A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of the Advances in Neural Information Processing Systems, 2012, 1097-1150. doi: 10.1145/3065386.

[37]

N. KrugerP. JanssenS. KalkanM. LappeA. LeonardisJ. PiaterA. J. Rodriguez-Sanchez and L. Wiskott, Deep hierarchies in the primate visual cortex: What can we learn for computer vision?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013), 1847-1871. doi: 10.1109/TPAMI.2012.272.

[38]

A. Kurakin, I. Goodfellow and S. Bengio, Adversarial examples in the physical world, arXiv preprint, arXiv: 1607.02533.

[39]

Y. LeCunL. BottouY. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), 2278-2324. doi: 10.1109/5.726791.

[40]

Y. LeCun, C. Cortes and C. J. Burges, The mnist database of handwritten digits, 1998.

[41]

C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., Photo-realistic single image super-resolution using a generative adversarial network, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017. doi: 10.1109/CVPR.2017.19.

[42]

H. Lee, C. Ekanadham and A. Y. Ng, Sparse deep belief net model for visual area v2, in Proceedings of the Advances in Neural Information Processing Systems, 2008, 873-880.

[43]

H. Lee, R. Grosse, R. Ranganath and A. Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the International Conference on Machine Learning, 2009, 609-616. doi: 10.1145/1553374.1553453.

[44]

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. Zitnick, Microsoft coco: common objects in context. corr abs/1405. 0312 (2014).

[45]

D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110. doi: 10.1023/B:VISI.0000029664.99615.94.

[46]

A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, 5188-5196. doi: 10.1109/CVPR.2015.7299155.

[47]

A. Mahendran and A. Vedaldi, Visualizing deep convolutional neural networks using natural pre-images, International Journal of Computer Vision, 120 (2016), 233-255. doi: 10.1007/s11263-016-0911-8.

[48]

M. ManassiB. Sayim and M. H. Herzog, When crowding of crowding leads to uncrowding, Journal of Vision, 13 (2013), 10-10.

[49]

A. MordvintsevC. Olah and M. Tyka, Inceptionism: Going deeper into neural networks, Google Research Blog. Retrieved June, 20 (2015), 14pp.

[50]

A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox and J. Clune, Synthesizing the preferred inputs for neurons in neural networks via deep generator networks, in Proceedings of the Advances in Neural Information Processing Systems, 2016, 3387-3395.

[51]

A. Nguyen, J. Yosinski and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 427-436. doi: 10.1109/CVPR.2015.7298640.

[52]

A. Nguyen, J. Yosinski and J. Clune, Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks, arXiv preprint, arXiv: 1602.03616.

[53]

S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22 (2010), 1345-1359. doi: 10.1109/TKDE.2009.191.

[54]

M. I. Posner and S. E. Petersen, The attention system of the human brain, Annual Review of Neuroscience, 13 (1990), 25-42.

[55]

C. Poultney, S. Chopra, Y. L. Cun et al., Efficient learning of sparse representations with an energy-based model, in Proceedings of the Advances in Neural Information Processing Systems, 2007, 1137-1144.

[56]

N. Qian, On the momentum term in gradient descent learning algorithms, Neural Networks, 12 (1999), 145-151. doi: 10.1016/S0893-6080(98)00116-6.

[57]

R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch and I. Fried., Invariant visual representation by single neurons in the human brain, Nature, 435 (2005), 1102-1107, URL http://dx.doi.org/10.1038/nature03687. doi: 10.1038/nature03687.

[58]

S. RenK. HeR. Girshick and J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2017), 1137-1149. doi: 10.1109/TPAMI.2016.2577031.

[59]

L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268. doi: 10.1016/0167-2789(92)90242-F.

[60]

H.-C. ShinH. R. RothM. GaoL. LuZ. XuI. NoguesJ. YaoD. Mollura and R. M. Summers, Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning, IEEE Transactions on Medical Imaging, 35 (2016), 1285-1298. doi: 10.1109/TMI.2016.2528162.

[61]

D. SilverA. HuangC. J. MaddisonA. GuezL. SifreG. Van Den DriesscheJ. SchrittwieserI. AntonoglouV. Panneershelvam and M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489. doi: 10.1038/nature16961.

[62]

K. Simonyan, A. Vedaldi and A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint, arXiv: 1312.6034.

[63]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint, arXiv: 1409.1556.

[64]

J. Sivic and A. Zisserman, Video google: A text retrieval approach to object matching in videos, in Proceeding of Ninth IEEE International Conference on Computer Vision, 2003, 1470. doi: 10.1109/ICCV.2003.1238663.

[65]

N. SrivastavaG. E. HintonA. KrizhevskyI. Sutskever and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 15 (2014), 1929-1958.

[66]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, 1-9. doi: 10.1109/CVPR.2015.7298594.

[67]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, Intriguing properties of neural networks, arXiv preprint, arXiv: 1312.6199.

[68]

P. VincentH. LarochelleI. LajoieY. Bengio and P.-A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, 11 (2010), 3371-3408.

[69]

L. WangY. Zhang and J. Feng, On the euclidean distance of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (2005), 1334-1339.

[70]

D. Wei, B. Zhou, A. Torrabla and W. Freeman, Understanding intra-class knowledge inside cnn, arXiv preprint, arXiv: 1507.02379.

[71]

J. Yosinski, J. Clune, A. Nguyen, T. Fuchs and H. Lipson, Understanding neural networks through deep visualization, arXiv preprint, arXiv: 1506.06579.

[72]

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Proceedings of the European Conference on Computer Vision, 2014, 818-833. doi: 10.1007/978-3-319-10590-1_53.

[73]

M. D. Zeiler, D. Krishnan, G. W. Taylor and R. Fergus, Deconvolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, 2528-2535. doi: 10.1109/CVPR.2010.5539957.

[74]

M. D. Zeiler, G. W. Taylor and R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in Proceedings of the IEEE International Conference on Computer Vision, 2011, 2018-2025. doi: 10.1109/ICCV.2011.6126474.

[75]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, Object detectors emerge in deep scene CNNs, arXiv preprint, arXiv: 1412.6856.

show all references

References:
[1]

P. Agrawal, R. Girshick and J. Malik, Analyzing the performance of multilayer neural networks for object recognition, in Proceedings of the European Conference on Computer Vision, 2014, 329-344. doi: 10.1007/978-3-319-10584-0_22.

[2]

M. Arjovsky, S. Chintala and L. Bottou, Wasserstein gan, arXiv preprint, arXiv: 1701.07875.

[3]

D. Bau, B. Zhou, A. Khosla, A. Oliva and A. Torralba, Network dissection: Quantifying interpretability of deep visual representations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 3319-3327. doi: 10.1109/CVPR.2017.354.

[4]

D. C. Ciresan, U. Meier, J. Masci, L. Maria Gambardella and J. Schmidhuber, Flexible, High performance convolutional neural networks for image classification, in Proceedings of the International Joint Conference on Artificial Intelligence, vol. 22, 2011, p1237.

[5]

R. Collobert, K. Kavukcuoglu and C. Farabet, Torch7: A matlab-like environment for machine learning, in Workshop on BigLearn, NIPS, 2011.

[6]

G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, Visual categorization with bags of keypoints, in Workshop on statistical learning in computer vision, ECCV, vol. 1, 2004, 1-2.

[7]

N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, 886-893. doi: 10.1109/CVPR.2005.177.

[8]

E. d'Angelo, A. Alahi and P. Vandergheynst, Beyond bits: Reconstructing images from local binary descriptors, in Proceedings of the IEEE Conference on Pattern Recognition, 2012, 935-938.

[9]

E. L. Denton, S. Chintala, R. Fergus et al., Deep generative image models using a Laplacian pyramid of adversarial networks, in Proceedings of the Advances in Neural Information Processing Systems, 2015, 1486-1494.

[10]

A. Dosovitskiy and T. Brox, Generating images with perceptual similarity metrics based on deep networks, in Proceedings of the Advances in Neural Information Processing Systems, 2016, 658-666.

[11]

A. Dosovitskiy and T. Brox, Inverting visual representations with convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 4829-4837. doi: 10.1109/CVPR.2016.522.

[12]

A. Dosovitskiy, J. Tobias Springenberg and T. Brox, Learning to generate chairs with convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 1538-1546. doi: 10.1109/CVPR.2015.7298761.

[13]

J. DuchiE. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 12 (2011), 2121-2159.

[14]

D. ErhanY. BengioA. Courville and P. Vincent, Visualizing higher-layer features of a deep network, Technical report, University of Montreal, (2009), p3.

[15]

P. F. FelzenszwalbR. B. GirshickD. McAllester and D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), 1627-1645. doi: 10.1109/TPAMI.2009.167.

[16]

R. Fong and A. Vedaldi, Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks, arXiv preprint, arXiv: 1801.03454.

[17]

L. A. Gatys, A. S. Ecker and M. Bethge, A neural algorithm of artistic style, Journal of Vision, 16 (2016), p326, arXiv: 1508.06576. doi: 10.1167/16.12.326.

[18]

L. A. Gatys, A. S. Ecker and M. Bethge, Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks, arXiv preprint, arXiv: 1505.07376, 12.

[19]

R. B. Girshick, P. F. Felzenszwalb and D. McAllester, Discriminatively trained deformable part models, release 5, http://people.cs.uchicago.edu/~rbg/latent-release5/.

[20]

R. Girshick, J. Donahue, T. Darrell and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 580-587. doi: 10.1109/CVPR.2014.81.

[21]

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the International Conference on Artificial Intelligence and Statistics, 2010, 249-256.

[22]

Y. Gong, L. Wang, R. Guo and S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, in Proceedings of the European Conference on Computer Vision, 2014, 392-407. doi: 10.1007/978-3-319-10584-0_26.

[23]

A. Gonzalez-GarciaD. Modolo and V. Ferrari, Do semantic parts emerge in convolutional neural networks?, International Journal of Computer Vision, 126 (2018), 476-494. doi: 10.1007/s11263-017-1048-0.

[24]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, in Proceedings of the Advances in Neural Information Processing Systems, 2014, 2672-2680.

[25]

A. Gordo, J. Almazán, J. Revaud and D. Larlus, Deep image retrieval: Learning global representations for image search, in Proceedings of the European Conference on Computer Vision, Springer, 2016, 241-257. doi: 10.1007/978-3-319-46466-4_15.

[26]

S. Han, H. Mao and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint, arXiv: 1510.00149.

[27]

K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, 770-778. doi: 10.1109/CVPR.2016.90.

[28]

G. E. HintonS. Osindero and Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation, 18 (2006), 1527-1554. doi: 10.1162/neco.2006.18.7.1527.

[29]

D. H. Hubel and T. N. Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of Physiology, 195 (1968), 215-243, URL http://dx.doi.org/10.1113/jphysiol.1968.sp008455. doi: 10.1113/jphysiol.1968.sp008455.

[30]

D. H. Hubel and T. N. Wiesel, Receptive fields of single neurones in the cat's striate cortex, The Journal of Physiology, 148 (1959), 574-591. doi: 10.1113/jphysiol.1959.sp006308.

[31]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the International Conference on Machine Learning, 2015, 448-456.

[32]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the International Conference on Machine Learning, 2015, 448-456.

[33]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the International Conference on Multimedia, 2014, 675-678. doi: 10.1145/2647868.2654889.

[34]

G.-S. Kalanit and M. Rafael, The human visual cortex, Annual Review of Neuroscience, 27 (2004), 649-677.

[35]

K. N. KayT. NaselarisR. J. Prenger and J. L. Gallant, Identifying natural images from human brain activity, Nature, 452 (2008), p352. doi: 10.1038/nature06713.

[36]

A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of the Advances in Neural Information Processing Systems, 2012, 1097-1150. doi: 10.1145/3065386.

[37]

N. KrugerP. JanssenS. KalkanM. LappeA. LeonardisJ. PiaterA. J. Rodriguez-Sanchez and L. Wiskott, Deep hierarchies in the primate visual cortex: What can we learn for computer vision?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013), 1847-1871. doi: 10.1109/TPAMI.2012.272.

[38]

A. Kurakin, I. Goodfellow and S. Bengio, Adversarial examples in the physical world, arXiv preprint, arXiv: 1607.02533.

[39]

Y. LeCunL. BottouY. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), 2278-2324. doi: 10.1109/5.726791.

[40]

Y. LeCun, C. Cortes and C. J. Burges, The mnist database of handwritten digits, 1998.

[41]

C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., Photo-realistic single image super-resolution using a generative adversarial network, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017. doi: 10.1109/CVPR.2017.19.

[42]

H. Lee, C. Ekanadham and A. Y. Ng, Sparse deep belief net model for visual area v2, in Proceedings of the Advances in Neural Information Processing Systems, 2008, 873-880.

[43]

H. Lee, R. Grosse, R. Ranganath and A. Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the International Conference on Machine Learning, 2009, 609-616. doi: 10.1145/1553374.1553453.

[44]

T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. Zitnick, Microsoft coco: common objects in context. corr abs/1405. 0312 (2014).

[45]

D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110. doi: 10.1023/B:VISI.0000029664.99615.94.

[46]

A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, 5188-5196. doi: 10.1109/CVPR.2015.7299155.

[47]

A. Mahendran and A. Vedaldi, Visualizing deep convolutional neural networks using natural pre-images, International Journal of Computer Vision, 120 (2016), 233-255. doi: 10.1007/s11263-016-0911-8.

[48]

M. ManassiB. Sayim and M. H. Herzog, When crowding of crowding leads to uncrowding, Journal of Vision, 13 (2013), 10-10.

[49]

A. MordvintsevC. Olah and M. Tyka, Inceptionism: Going deeper into neural networks, Google Research Blog. Retrieved June, 20 (2015), 14pp.

[50]

A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox and J. Clune, Synthesizing the preferred inputs for neurons in neural networks via deep generator networks, in Proceedings of the Advances in Neural Information Processing Systems, 2016, 3387-3395.

[51]

A. Nguyen, J. Yosinski and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 427-436. doi: 10.1109/CVPR.2015.7298640.

[52]

A. Nguyen, J. Yosinski and J. Clune, Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks, arXiv preprint, arXiv: 1602.03616.

[53]

S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22 (2010), 1345-1359. doi: 10.1109/TKDE.2009.191.

[54]

M. I. Posner and S. E. Petersen, The attention system of the human brain, Annual Review of Neuroscience, 13 (1990), 25-42.

[55]

C. Poultney, S. Chopra, Y. L. Cun et al., Efficient learning of sparse representations with an energy-based model, in Proceedings of the Advances in Neural Information Processing Systems, 2007, 1137-1144.

[56]

N. Qian, On the momentum term in gradient descent learning algorithms, Neural Networks, 12 (1999), 145-151. doi: 10.1016/S0893-6080(98)00116-6.

[57]

R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch and I. Fried., Invariant visual representation by single neurons in the human brain, Nature, 435 (2005), 1102-1107, URL http://dx.doi.org/10.1038/nature03687. doi: 10.1038/nature03687.

[58]

S. RenK. HeR. Girshick and J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2017), 1137-1149. doi: 10.1109/TPAMI.2016.2577031.

[59]

L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268. doi: 10.1016/0167-2789(92)90242-F.

[60]

H.-C. ShinH. R. RothM. GaoL. LuZ. XuI. NoguesJ. YaoD. Mollura and R. M. Summers, Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning, IEEE Transactions on Medical Imaging, 35 (2016), 1285-1298. doi: 10.1109/TMI.2016.2528162.

[61]

D. SilverA. HuangC. J. MaddisonA. GuezL. SifreG. Van Den DriesscheJ. SchrittwieserI. AntonoglouV. Panneershelvam and M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489. doi: 10.1038/nature16961.

[62]

K. Simonyan, A. Vedaldi and A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint, arXiv: 1312.6034.

[63]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint, arXiv: 1409.1556.

[64]

J. Sivic and A. Zisserman, Video google: A text retrieval approach to object matching in videos, in Proceeding of Ninth IEEE International Conference on Computer Vision, 2003, 1470. doi: 10.1109/ICCV.2003.1238663.

[65]

N. SrivastavaG. E. HintonA. KrizhevskyI. Sutskever and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 15 (2014), 1929-1958.

[66]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, 1-9. doi: 10.1109/CVPR.2015.7298594.

[67]

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, Intriguing properties of neural networks, arXiv preprint, arXiv: 1312.6199.

[68]

P. VincentH. LarochelleI. LajoieY. Bengio and P.-A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, 11 (2010), 3371-3408.

[69]

L. WangY. Zhang and J. Feng, On the euclidean distance of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (2005), 1334-1339.

[70]

D. Wei, B. Zhou, A. Torrabla and W. Freeman, Understanding intra-class knowledge inside cnn, arXiv preprint, arXiv: 1507.02379.

[71]

J. Yosinski, J. Clune, A. Nguyen, T. Fuchs and H. Lipson, Understanding neural networks through deep visualization, arXiv preprint, arXiv: 1506.06579.

[72]

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Proceedings of the European Conference on Computer Vision, 2014, 818-833. doi: 10.1007/978-3-319-10590-1_53.

[73]

M. D. Zeiler, D. Krishnan, G. W. Taylor and R. Fergus, Deconvolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, 2528-2535. doi: 10.1109/CVPR.2010.5539957.

[74]

M. D. Zeiler, G. W. Taylor and R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in Proceedings of the IEEE International Conference on Computer Vision, 2011, 2018-2025. doi: 10.1109/ICCV.2011.6126474.

[75]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, Object detectors emerge in deep scene CNNs, arXiv preprint, arXiv: 1412.6856.

Figure 1.  CaffeNet architecture
Figure 2.  Convolutional and max-pooling process
Figure 3.  Human vision and CNNs visualization
Figure 4.  First layer of CaffeNet visualized by Activation Maximization
Figure 5.  Hidden layers of CaffeNet visualization by Activation Maximization. Adapted from "Understanding Neural Networks Through Deep Visualization," by J. Yosinski, 2015
Figure 6.  Output layer of CaffeNet visualized by Activation Maximization
Figure 7.  The structure of the Deconvolutional Network
Figure 8.  CaffeNet visualized by DeconvNet
Figure 9.  First and second layer visualization of AlexNet and ZFNet Adapted from "Visualizing and Understanding Convolutional Networks," by M.D. Zeiler, 2014
Figure 10.  Feature evolution during training ZFNet. Adapted from "Visualizing and Understanding Convolutional Networks," by M.D. Zeiler, 2014
Figure 11.  The data flow of the two Network Inversion algorithms
Figure 12.  AlexNet reconstruction by Network Inversion with regularizer and UpconvNet. Adapted from "Inverting Visual Representations with Convolutional Networks," by A. Dosovitskiy, 2016
Figure 13.  AlexNet reconstruction by perturbing the feature maps. Adapted from "Inverting Visual Representations with Convolutional Networks," by A. Dosovitskiy, 2016
Figure 14.  The Broden images that activate certain neurons in AlexNet
Figure 15.  Illustration of network dissection for measuring semantic alignment of neuron in a given CNN. Adapted from "Network Dissection: Quantifying Interpretability of Deep Visual Representations," by D. Bau, 2017
Figure 16.  AlexNet visualization by Network Dissection
Figure 17.  Semantic concept emerging in each layers and under different training conditions
Figure 18.  Network Dissection with single neuron and neuron combinations. Adapted from "Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks," by R. Fong, 2018
Figure 19.  Adversarial noises that manipulate the CNN classification
Figure 20.  Adversarial example visualization
Figure 21.  Style transfer example
Table 1.  Visualization methods
Method Interpretation Perspective Focused Layer Applied Network Representative Study
Activation Maximization Individual Neuron with visualized pattern CLs
FLs
Auto-Encoder, DBN, AlexNet [26]
Deconvolutional Neural Networks Neuron activation in input image CLs AlexNet [55]
Network Inversion One layer CLs
FLs
HOG, SIFT, LBD, Bag of words, CaffeNet [29][64]
Network Dissection Individual Neuron with semantic concept CLs AlexNet, VGG, GoogLeNet, ResNet [32][70]
Method Interpretation Perspective Focused Layer Applied Network Representative Study
Activation Maximization Individual Neuron with visualized pattern CLs
FLs
Auto-Encoder, DBN, AlexNet [26]
Deconvolutional Neural Networks Neuron activation in input image CLs AlexNet [55]
Network Inversion One layer CLs
FLs
HOG, SIFT, LBD, Bag of words, CaffeNet [29][64]
Network Dissection Individual Neuron with semantic concept CLs AlexNet, VGG, GoogLeNet, ResNet [32][70]
[1]

Jianfeng Feng, Mariya Shcherbina, Brunello Tirozzi. Stability of the dynamics of an asymmetric neural network. Communications on Pure & Applied Analysis, 2009, 8 (2) : 655-671. doi: 10.3934/cpaa.2009.8.655

[2]

Ying Sue Huang, Chai Wah Wu. Stability of cellular neural network with small delays. Conference Publications, 2005, 2005 (Special) : 420-426. doi: 10.3934/proc.2005.2005.420

[3]

King Hann Lim, Hong Hui Tan, Hendra G. Harno. Approximate greatest descent in neural network optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 327-336. doi: 10.3934/naco.2018021

[4]

Shyan-Shiou Chen, Chih-Wen Shih. Asymptotic behaviors in a transiently chaotic neural network. Discrete & Continuous Dynamical Systems - A, 2004, 10 (3) : 805-826. doi: 10.3934/dcds.2004.10.805

[5]

Rui Hu, Yuan Yuan. Stability, bifurcation analysis in a neural network model with delay and diffusion. Conference Publications, 2009, 2009 (Special) : 367-376. doi: 10.3934/proc.2009.2009.367

[6]

Hui-Qiang Ma, Nan-Jing Huang. Neural network smoothing approximation method for stochastic variational inequality problems. Journal of Industrial & Management Optimization, 2015, 11 (2) : 645-660. doi: 10.3934/jimo.2015.11.645

[7]

Yixin Guo, Aijun Zhang. Existence and nonexistence of traveling pulses in a lateral inhibition neural network. Discrete & Continuous Dynamical Systems - B, 2016, 21 (6) : 1729-1755. doi: 10.3934/dcdsb.2016020

[8]

Jianhong Wu, Ruyuan Zhang. A simple delayed neural network with large capacity for associative memory. Discrete & Continuous Dynamical Systems - B, 2004, 4 (3) : 851-863. doi: 10.3934/dcdsb.2004.4.851

[9]

Sanjay K. Mazumdar, Cheng-Chew Lim. A neural network based anti-skid brake system. Discrete & Continuous Dynamical Systems - A, 1999, 5 (2) : 321-338. doi: 10.3934/dcds.1999.5.321

[10]

K. L. Mak, J. G. Peng, Z. B. Xu, K. F. C. Yiu. A novel neural network for associative memory via dynamical systems. Discrete & Continuous Dynamical Systems - B, 2006, 6 (3) : 573-590. doi: 10.3934/dcdsb.2006.6.573

[11]

D. Warren, K Najarian. Learning theory applied to Sigmoid network classification of protein biological function using primary protein structure. Conference Publications, 2003, 2003 (Special) : 898-904. doi: 10.3934/proc.2003.2003.898

[12]

Fang Han, Bin Zhen, Ying Du, Yanhong Zheng, Marian Wiercigroch. Global Hopf bifurcation analysis of a six-dimensional FitzHugh-Nagumo neural network with delay by a synchronized scheme. Discrete & Continuous Dynamical Systems - B, 2011, 16 (2) : 457-474. doi: 10.3934/dcdsb.2011.16.457

[13]

Fengqiu Liu, Xiaoping Xue. Subgradient-based neural network for nonconvex optimization problems in support vector machines with indefinite kernels. Journal of Industrial & Management Optimization, 2016, 12 (1) : 285-301. doi: 10.3934/jimo.2016.12.285

[14]

Boguslaw Twarog, Robert Pekala, Jacek Bartman, Zbigniew Gomolka. The changes of air gap in inductive engines as vibration indicator aided by mathematical model and artificial neural network. Conference Publications, 2007, 2007 (Special) : 1005-1012. doi: 10.3934/proc.2007.2007.1005

[15]

Lixin Xu, Wanquan Liu. A new recurrent neural network adaptive approach for host-gate way rate control protocol within intranets using ATM ABR service. Journal of Industrial & Management Optimization, 2005, 1 (3) : 389-404. doi: 10.3934/jimo.2005.1.389

[16]

Jiangtao Mo, Liqun Qi, Zengxin Wei. A network simplex algorithm for simple manufacturing network model. Journal of Industrial & Management Optimization, 2005, 1 (2) : 251-273. doi: 10.3934/jimo.2005.1.251

[17]

Konstantin Avrachenkov, Giovanni Neglia, Vikas Vikram Singh. Network formation games with teams. Journal of Dynamics & Games, 2016, 3 (4) : 303-318. doi: 10.3934/jdg.2016016

[18]

Joanna Tyrcha, John Hertz. Network inference with hidden units. Mathematical Biosciences & Engineering, 2014, 11 (1) : 149-165. doi: 10.3934/mbe.2014.11.149

[19]

T. S. Evans, A. D. K. Plato. Network rewiring models. Networks & Heterogeneous Media, 2008, 3 (2) : 221-238. doi: 10.3934/nhm.2008.3.221

[20]

David J. Aldous. A stochastic complex network model. Electronic Research Announcements, 2003, 9: 152-161.

 Impact Factor: 

Metrics

  • PDF downloads (60)
  • HTML views (812)
  • Cited by (0)

Other articles
by authors

[Back to Top]