November 2018, 1(4): 331-348. doi: 10.3934/mfc.2018016

Privacy preserving feature selection and Multiclass Classification for horizontally distributed data

1. 

Department of Computer Science, 33 Gilmer Street SE Atlanta, GA, USA

2. 

University of North Georgia, Dahlonega, GA, USA

3. 

Data-driven Intelligence Research Laboratory, College of Computing and Software Engineering, Kennesaw State University, 1100 South Marietta Pkwy, Marietta, GA, USA

* Corresponding author: Meng Han

Received  August 2018 Revised  October 2018 Published  December 2018

In the last two decades, a lot of scientific fields have experienced a huge growth in data volume and data complexity, which brings data miners lots of opportunities, as well as many challenges. With the advent of the era of big data, applying data mining techniques on assembling data from multiple parties (or sources) has become a leading trend. However, those data mining tasks may divulge individuals' privacy, which leads to the increased concerns in privacy preserving. In this work, a Privacy Preserving feature selection method (PPFS-IFW) and Multiclass Classification method (PPM2C) are proposed. Experiments had been conducted to validate the performance of the proposed approaches. Both PPFS-IFW and PPM2C were tested on six benchmark datasets. The testing results demonstrate PPFS-IFW's capability in enhancing the classification performance at the level of accuracy by selection informative features. PPFS-IFW can not only preserve private information but also outperform some other state-of-the-art feature selection approaches. Experimental results also show that the proposed PPM2C method is workable and stable. Particularly, It reduces the risk of over-fitting when compared with the regular Support Vector Machine. In the meantime, by employing the Secure Sum Protocol to encrypt data at the bottom layer, users' privacy is preserved.

Citation: Yunmei Lu, Mingyuan Yan, Meng Han, Qingliang Yang, Yanqing Zhang. Privacy preserving feature selection and Multiclass Classification for horizontally distributed data. Mathematical Foundations of Computing, 2018, 1 (4) : 331-348. doi: 10.3934/mfc.2018016
References:
[1]

C. Ambroise and G. J. McLachlan, Selection bias in gene extraction on the basis of microarray gene-expression data, Proceedings of the National Academy of Sciences, 99 (2002), 6562-6566. doi: 10.1073/pnas.102102699.

[2]

V. G. Ashok, K. Navuluri, A. Alhafdhi and R. Mukkamala, Dataless data mining: Association rules-based distributed privacy-preserving data mining, in Information Technology-New Generations (ITNG), 2015 12th International Conference on, IEEE, 2015, 615-620. doi: 10.1109/ITNG.2015.102.

[3]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013.

[4]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013.

[5]

S. D. Bay, Combining nearest neighbor classifiers through multiple feature subsets. in ICML, 98 (1998), 37-45.

[6]

M. Bendechache and M.-T. Kechadi, Distributed clustering algorithm for spatial data mining, in Spatial Data Mining and Geographical Knowledge Services (ICSDM), 2015 2nd IEEE International Conference on. IEEE, 2015, 60-65.

[7]

L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel, Y. LeCun, U. A. Muller, E. Sackinger, P. Simard et al., Comparison of classifier methods: a case study in handwritten digit recognition, in Pattern Recognition, 1994. Vol. 2-Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on, vol. 2. IEEE, 1994, 77-82.

[8]

Z. Cai, R. Goebel, M. R. Salavatipour, Y. Shi, L. Xu and G. Lin, Selecting genes with dissimilar discrimination strength for sample class prediction, in Proceedings Of The 5th Asia-Pacific Bioinformatics Conference, World Scientific, 2007, 81-90. doi: 10.1142/9781860947995_0011.

[9]

P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in ICML, 98 (1998), 82-90.

[10]

C. J. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121-167.

[11]

Z. CaiT. Zhang and X.-F Wan, A computational framework for influenza antigenic cartography, PLoS Computational Biology, 6 (2010), e1000949. doi: 10.1371/journal.pcbi.1000949.

[12]

C. CliftonM. KantarciogluJ. VaidyaX. Lin and M. Y. Zhu, Tools for privacy preserving distributed data mining, ACM Sigkdd Explorations Newsletter, 4 (2002), 273-297. doi: 10.1145/772862.772867.

[13]

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297. doi: 10.1007/BF00994018.

[14]

P. Drineas and M. W. Mahoney, On the nyström method for approximating a gram matrix for improved kernel-based learning, journal of Machine Learning Research, 6 (2015), 2153-2175.

[15]

S. DudoitY. H. YangM. J. Callow and T. P. Speed, Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments, Statistica Sinica, 12 (2002), 111-139.

[16]

R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7 (1936), 179-188. doi: 10.1111/j.1469-1809.1936.tb02137.x.

[17]

Z. Cai and X. Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Transactions on Network Science and Engineering, (2018), 1-1. doi: 10.1109/TNSE.2018.2830307.

[18]

V. Franc and S. Sonnenburg, Optimized cutting plane algorithm for large-scale risk minimization, Journal of Machine Learning Research, 10 (2009), 2157-2192.

[19]

J. Friedman, Another Approach to Polychotomous Classification, Technical report, Department of Statistics, Stanford University, Tech. Rep., 1996.

[20]

C. FurlanelloM. SerafiniS. Merler and G. Jurman, Entropy-based gene ranking without selection bias for the predictive classification of microarray data, BMC Bioinformatics, 4 (2003), 54.

[21]

M. Han, J. Li, Ji and Z. Cai, Q. Han, Privacy reserved influence maximization in gps-enabled cyber-physical and online social networks, in 2016 IEEE International Conferences on Social Computing and Networking (SocialCom), 2016, 284-292.

[22]

H. Albinali, M. Han, J. Wang, H. Gao, Y. Li, The roles of social network mavens, in 2016 12th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), 2016, 1-8.

[23]

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, science, 286 (1999), 531-537. doi: 10.1126/science.286.5439.531.

[24]

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, 3 (2003), 1157-1182.

[25]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.

[26]

I. Kholod, M. Kuprianov and I. Petukhov, Distributed data mining based on actors for internet of things, in Embedded Computing (MECO), 2016 5th Mediterranean Conference on, IEEE, 2016, 480-484. doi: 10.1109/MECO.2016.7525698.

[27]

S. Knerr, L. Personnaz and G. Dreyfus, Single-layer learning revisited: A stepwise procedure for building and training a neural network, in Neurocomputing, Springer, 68 (1990), 41-50. doi: 10.1016/j.jcss.2003.06.002.

[28]

L. Liu, M. Han, Y. Zhou, Y. Wang, LSTM Recurrent Neural Networks for Influenza Trends Prediction, in International Symposium on Bioinformatics Research and Applications, 2018, 259-264.

[29]

Y. Lu, M. Yan, M. Han, Q. Yang, Y. Zhang, Privacy Preserving Multiclass Classification for Horizontally Distributed Data, in Proceedings of the 19th Annual SIG Conference on Information Technology Education, 2018, 165-165.

[30]

Y. Lindell and B. Pinkas, Privacy preserving data mining, Journal of Cryptology, 15 (2002), 177-206. doi: 10.1007/s00145-001-0019-2.

[31]

M. Han, J. Wang, M. Yan, C. Ai, Z. Duan, Z. Hong, Near-complete privacy protection: cognitive optimal strategy in location-based services, in Procedia Computer Science, 129 (2018), 298-304.

[32]

A. Joshi, M. Han, Y. Wang, A survey on security and privacy issues of blockchain technology, in Mathematical Foundations of Computing, 1 (2018), 121-147.

[33]

Y. Lu, P. Phoungphol and Y. Zhang, Privacy aware non-linear support vector machine for multi-source big data, in Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on, IEEE, 2014, 783-789. doi: 10.1109/TrustCom.2014.103.

[34]

S. MaldonadoR. Weber and J. Basak, Simultaneous feature selection and classification using kernel-penalized support vector machines, Information Sciences, 181 (2011), 115-128. doi: 10.1016/j.ins.2010.08.047.

[35]

J. Miao and L. Niu, A survey on feature selection, Procedia Computer Science, 91 (2016), 919-926. doi: 10.1016/j.procs.2016.07.111.

[36]

J. Miranda, R. Montoya and R. Weber, Linear penalization support vector machines for feature selection, in International Conference on Pattern Recognition and Machine Intelligence. Springer, 2005, 188-192.

[37]

K. Parmar, D. Vaghela and P. Sharma, Performance prediction of students using distributed data mining, in Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on, IEEE, 2015, 1-5.

[38]

I. Rish, An empirical study of the naive bayes classifier, in IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3 (2001), 41-46.

[39]

S. L. Salzberg, C4. 5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., 1993, Machine Learning, 16 (1994), 235-240.

[40]

A. SharmaS. Imoto and S. Miyano, A top-r feature selection algorithm for microarray gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9 (2012), 754-764.

[41]

Y. Shen, H. Shao and Y. Li, Research on the personalized privacy preserving distributed data mining, in Future Information Technology and Management Engineering, 2009. FITME'09. Second International Conference on. IEEE, 2009, 436-439. doi: 10.1109/FITME.2009.115.

[42]

C.-A. Tsai, C.-H. Huang, C.-W. Chang and C.-H. Chen, Recursive feature selection with significant variables of support vectors, Computational and Mathematical Methods in Medicine, 2012 (2012), Art. ID 712542, 12 pp. doi: 10.1155/2012/712542.

[43]

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V. Vapnik, Feature selection for svms, in Advances in Neural Information Processing Systems, 2001, 668-674.

[44]

Z. Xu and X. Yi, Classification of privacy-preserving distributed data mining protocols, in Digital Information Management (ICDIM), 2011 Sixth International Conference on. IEEE, 2011, 337-342. doi: 10.1109/ICDIM.2011.6093356.

[45]

K. Yang, Z. Cai, J. Li and G. Lin, A stable gene selection in microarray data analysis, BMC bioinformatics, 7 (2006), p228.

[46]

J. Ye and T. Xiong, Computational and theoretical analysis of null space and orthogonal linear discriminant analysis, Journal of Machine Learning Research, 7 (2006), 1183-1204.

[47]

L. Ying-hua, Y. Bing-ru, C. Dan-yang and M. Nan, State-of-the-art in distributed privacy preserving data mining, in Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on. IEEE, 2011, 545-549. doi: 10.1109/ICCSN.2011.6014329.

[48]

K. Zhang, L. Lan, Z. Wang and F. Moerchen, Scaling up kernel svm on limited resources: A low-rank linearization approach, in Artificial Intelligence and Statistics, 2012, 1425-1434.

[49]

X. ZhangX. LuQ. ShiX.-q. XuE. L. Hon-chiuN. HarrisJ. D. IglehartA. MironJ. S. Liu and W. H. Wong, Recursive svm feature selection and sample classification for mass-spectrometry and microarray data, BMC Bioinformatics, 7 (2006), p197.

[50]

F. Zhang, C. Rong, G. Zhao, J. Wu and X. Wu, Privacy-preserving two-party distributed association rules mining on horizontally partitioned data, in Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on. IEEE, 2013, 633-640. doi: 10.1109/CLOUDCOM-ASIA.2013.87.

[51]

K. Zhang, I. W. Tsang and J. T. Kwok, Improved nyström low-rank approximation and error analysis, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, 1232-1239. doi: 10.1145/1390156.1390311.

[52]

X. ZhengZ. Cai and Y. Li, Data linkage in smart internet of things systems: A consideration from a privacy perspective, IEEE Communications Magazine, 56 (2018), 55-61. doi: 10.1109/MCOM.2018.1701245.

[53]

Z. ZhuY.-S. Ong and M. Dash, Markov blanket-embedded genetic algorithm for gene selection, Pattern Recognition, 40 (2007), 3236-3248. doi: 10.1016/j.patcog.2007.02.007.

[54]

http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/, 2013.

show all references

References:
[1]

C. Ambroise and G. J. McLachlan, Selection bias in gene extraction on the basis of microarray gene-expression data, Proceedings of the National Academy of Sciences, 99 (2002), 6562-6566. doi: 10.1073/pnas.102102699.

[2]

V. G. Ashok, K. Navuluri, A. Alhafdhi and R. Mukkamala, Dataless data mining: Association rules-based distributed privacy-preserving data mining, in Information Technology-New Generations (ITNG), 2015 12th International Conference on, IEEE, 2015, 615-620. doi: 10.1109/ITNG.2015.102.

[3]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013.

[4]

K. Bache and M. Lichman, Uci machine learning repository, http://archive.ics.uci.edu/ml, 2013.

[5]

S. D. Bay, Combining nearest neighbor classifiers through multiple feature subsets. in ICML, 98 (1998), 37-45.

[6]

M. Bendechache and M.-T. Kechadi, Distributed clustering algorithm for spatial data mining, in Spatial Data Mining and Geographical Knowledge Services (ICSDM), 2015 2nd IEEE International Conference on. IEEE, 2015, 60-65.

[7]

L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel, Y. LeCun, U. A. Muller, E. Sackinger, P. Simard et al., Comparison of classifier methods: a case study in handwritten digit recognition, in Pattern Recognition, 1994. Vol. 2-Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on, vol. 2. IEEE, 1994, 77-82.

[8]

Z. Cai, R. Goebel, M. R. Salavatipour, Y. Shi, L. Xu and G. Lin, Selecting genes with dissimilar discrimination strength for sample class prediction, in Proceedings Of The 5th Asia-Pacific Bioinformatics Conference, World Scientific, 2007, 81-90. doi: 10.1142/9781860947995_0011.

[9]

P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in ICML, 98 (1998), 82-90.

[10]

C. J. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2 (1998), 121-167.

[11]

Z. CaiT. Zhang and X.-F Wan, A computational framework for influenza antigenic cartography, PLoS Computational Biology, 6 (2010), e1000949. doi: 10.1371/journal.pcbi.1000949.

[12]

C. CliftonM. KantarciogluJ. VaidyaX. Lin and M. Y. Zhu, Tools for privacy preserving distributed data mining, ACM Sigkdd Explorations Newsletter, 4 (2002), 273-297. doi: 10.1145/772862.772867.

[13]

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, 20 (1995), 273-297. doi: 10.1007/BF00994018.

[14]

P. Drineas and M. W. Mahoney, On the nyström method for approximating a gram matrix for improved kernel-based learning, journal of Machine Learning Research, 6 (2015), 2153-2175.

[15]

S. DudoitY. H. YangM. J. Callow and T. P. Speed, Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments, Statistica Sinica, 12 (2002), 111-139.

[16]

R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7 (1936), 179-188. doi: 10.1111/j.1469-1809.1936.tb02137.x.

[17]

Z. Cai and X. Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Transactions on Network Science and Engineering, (2018), 1-1. doi: 10.1109/TNSE.2018.2830307.

[18]

V. Franc and S. Sonnenburg, Optimized cutting plane algorithm for large-scale risk minimization, Journal of Machine Learning Research, 10 (2009), 2157-2192.

[19]

J. Friedman, Another Approach to Polychotomous Classification, Technical report, Department of Statistics, Stanford University, Tech. Rep., 1996.

[20]

C. FurlanelloM. SerafiniS. Merler and G. Jurman, Entropy-based gene ranking without selection bias for the predictive classification of microarray data, BMC Bioinformatics, 4 (2003), 54.

[21]

M. Han, J. Li, Ji and Z. Cai, Q. Han, Privacy reserved influence maximization in gps-enabled cyber-physical and online social networks, in 2016 IEEE International Conferences on Social Computing and Networking (SocialCom), 2016, 284-292.

[22]

H. Albinali, M. Han, J. Wang, H. Gao, Y. Li, The roles of social network mavens, in 2016 12th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN), 2016, 1-8.

[23]

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, science, 286 (1999), 531-537. doi: 10.1126/science.286.5439.531.

[24]

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, 3 (2003), 1157-1182.

[25]

I. GuyonJ. WestonS. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning, 46 (2002), 389-422.

[26]

I. Kholod, M. Kuprianov and I. Petukhov, Distributed data mining based on actors for internet of things, in Embedded Computing (MECO), 2016 5th Mediterranean Conference on, IEEE, 2016, 480-484. doi: 10.1109/MECO.2016.7525698.

[27]

S. Knerr, L. Personnaz and G. Dreyfus, Single-layer learning revisited: A stepwise procedure for building and training a neural network, in Neurocomputing, Springer, 68 (1990), 41-50. doi: 10.1016/j.jcss.2003.06.002.

[28]

L. Liu, M. Han, Y. Zhou, Y. Wang, LSTM Recurrent Neural Networks for Influenza Trends Prediction, in International Symposium on Bioinformatics Research and Applications, 2018, 259-264.

[29]

Y. Lu, M. Yan, M. Han, Q. Yang, Y. Zhang, Privacy Preserving Multiclass Classification for Horizontally Distributed Data, in Proceedings of the 19th Annual SIG Conference on Information Technology Education, 2018, 165-165.

[30]

Y. Lindell and B. Pinkas, Privacy preserving data mining, Journal of Cryptology, 15 (2002), 177-206. doi: 10.1007/s00145-001-0019-2.

[31]

M. Han, J. Wang, M. Yan, C. Ai, Z. Duan, Z. Hong, Near-complete privacy protection: cognitive optimal strategy in location-based services, in Procedia Computer Science, 129 (2018), 298-304.

[32]

A. Joshi, M. Han, Y. Wang, A survey on security and privacy issues of blockchain technology, in Mathematical Foundations of Computing, 1 (2018), 121-147.

[33]

Y. Lu, P. Phoungphol and Y. Zhang, Privacy aware non-linear support vector machine for multi-source big data, in Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on, IEEE, 2014, 783-789. doi: 10.1109/TrustCom.2014.103.

[34]

S. MaldonadoR. Weber and J. Basak, Simultaneous feature selection and classification using kernel-penalized support vector machines, Information Sciences, 181 (2011), 115-128. doi: 10.1016/j.ins.2010.08.047.

[35]

J. Miao and L. Niu, A survey on feature selection, Procedia Computer Science, 91 (2016), 919-926. doi: 10.1016/j.procs.2016.07.111.

[36]

J. Miranda, R. Montoya and R. Weber, Linear penalization support vector machines for feature selection, in International Conference on Pattern Recognition and Machine Intelligence. Springer, 2005, 188-192.

[37]

K. Parmar, D. Vaghela and P. Sharma, Performance prediction of students using distributed data mining, in Innovations in Information, Embedded and Communication Systems (ICIIECS), 2015 International Conference on, IEEE, 2015, 1-5.

[38]

I. Rish, An empirical study of the naive bayes classifier, in IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, 3 (2001), 41-46.

[39]

S. L. Salzberg, C4. 5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., 1993, Machine Learning, 16 (1994), 235-240.

[40]

A. SharmaS. Imoto and S. Miyano, A top-r feature selection algorithm for microarray gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 9 (2012), 754-764.

[41]

Y. Shen, H. Shao and Y. Li, Research on the personalized privacy preserving distributed data mining, in Future Information Technology and Management Engineering, 2009. FITME'09. Second International Conference on. IEEE, 2009, 436-439. doi: 10.1109/FITME.2009.115.

[42]

C.-A. Tsai, C.-H. Huang, C.-W. Chang and C.-H. Chen, Recursive feature selection with significant variables of support vectors, Computational and Mathematical Methods in Medicine, 2012 (2012), Art. ID 712542, 12 pp. doi: 10.1155/2012/712542.

[43]

J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V. Vapnik, Feature selection for svms, in Advances in Neural Information Processing Systems, 2001, 668-674.

[44]

Z. Xu and X. Yi, Classification of privacy-preserving distributed data mining protocols, in Digital Information Management (ICDIM), 2011 Sixth International Conference on. IEEE, 2011, 337-342. doi: 10.1109/ICDIM.2011.6093356.

[45]

K. Yang, Z. Cai, J. Li and G. Lin, A stable gene selection in microarray data analysis, BMC bioinformatics, 7 (2006), p228.

[46]

J. Ye and T. Xiong, Computational and theoretical analysis of null space and orthogonal linear discriminant analysis, Journal of Machine Learning Research, 7 (2006), 1183-1204.

[47]

L. Ying-hua, Y. Bing-ru, C. Dan-yang and M. Nan, State-of-the-art in distributed privacy preserving data mining, in Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on. IEEE, 2011, 545-549. doi: 10.1109/ICCSN.2011.6014329.

[48]

K. Zhang, L. Lan, Z. Wang and F. Moerchen, Scaling up kernel svm on limited resources: A low-rank linearization approach, in Artificial Intelligence and Statistics, 2012, 1425-1434.

[49]

X. ZhangX. LuQ. ShiX.-q. XuE. L. Hon-chiuN. HarrisJ. D. IglehartA. MironJ. S. Liu and W. H. Wong, Recursive svm feature selection and sample classification for mass-spectrometry and microarray data, BMC Bioinformatics, 7 (2006), p197.

[50]

F. Zhang, C. Rong, G. Zhao, J. Wu and X. Wu, Privacy-preserving two-party distributed association rules mining on horizontally partitioned data, in Cloud Computing and Big Data (CloudCom-Asia), 2013 International Conference on. IEEE, 2013, 633-640. doi: 10.1109/CLOUDCOM-ASIA.2013.87.

[51]

K. Zhang, I. W. Tsang and J. T. Kwok, Improved nyström low-rank approximation and error analysis, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, 1232-1239. doi: 10.1145/1390156.1390311.

[52]

X. ZhengZ. Cai and Y. Li, Data linkage in smart internet of things systems: A consideration from a privacy perspective, IEEE Communications Magazine, 56 (2018), 55-61. doi: 10.1109/MCOM.2018.1701245.

[53]

Z. ZhuY.-S. Ong and M. Dash, Markov blanket-embedded genetic algorithm for gene selection, Pattern Recognition, 40 (2007), 3236-3248. doi: 10.1016/j.patcog.2007.02.007.

[54]

http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/, 2013.

Figure 1.  Workflow of PPM2C
Figure 2.  Classification accuracy improved by PPFS-IFW under CV1 scenario
Figure 3.  Classification accuracy improved by PPFS-IFW under CV2 scenario
Figure 4.  Classification Accuracy comparison before and after feature selection (PPFS-IFW)
Figure 5.  Comparison of classification accuracy for PPM2C when using PAN-SVM and LIBSVM
Figure 6.  Classification accuracy of PrivacySVM under CV1 and CV2
Figure 7.  Classification accuracy of LIBSVM under CV1 and CV2
Figure 8.  Classification accuracy of PrivacySVM under CV1
Figure 9.  Classification accuracy of PrivacySVM under CV2
Table 1.  Details of Datasets used in Evaluation of PPFS-IFW
Datasetnum. samples num. features C $\gamma$
Diabetes(DIA) 768 8 512.0 0.0078125
Ionosphere 351 34 8.0 0.5
Colon 62 2000 32.0 0.0078125
Leukemia 72 7129 128.0 0.0001221
Lymhoma(DLBCL) 47 4026 2.0 0.0078125
Breast Cancer (WBC) 569 30 128.0 8.0
Datasetnum. samples num. features C $\gamma$
Diabetes(DIA) 768 8 512.0 0.0078125
Ionosphere 351 34 8.0 0.5
Colon 62 2000 32.0 0.0078125
Leukemia 72 7129 128.0 0.0001221
Lymhoma(DLBCL) 47 4026 2.0 0.0078125
Breast Cancer (WBC) 569 30 128.0 8.0
Table 2.  Accuracy improved under CV1 and CV2
Dataset CV2 CV1 CV1 num. of Feature CV2 num. of Feature
DIA $3.39\%$ $2.10\%$ $4$ $4$
Ionosphere $0.35\%$ $3.42\%$ $2$ $8$
Colon $3.08\%$ $8.00\%$ $34$ $157$
WBC $2.47\%$ $1.12\%$ $10$ $4$
DLBCL $5.57\%$ $10.95\%$ $394$ $444$
Leukemia $8.57\%$ $3.45\%$ $537$ $631$
Sum $23.43\%$ $29.04\%$ $981$ $1248$
Dataset CV2 CV1 CV1 num. of Feature CV2 num. of Feature
DIA $3.39\%$ $2.10\%$ $4$ $4$
Ionosphere $0.35\%$ $3.42\%$ $2$ $8$
Colon $3.08\%$ $8.00\%$ $34$ $157$
WBC $2.47\%$ $1.12\%$ $10$ $4$
DLBCL $5.57\%$ $10.95\%$ $394$ $444$
Leukemia $8.57\%$ $3.45\%$ $537$ $631$
Sum $23.43\%$ $29.04\%$ $981$ $1248$
Table 3.  Accuracy comparison with other methods
Dataset Fisher SVM FSV RFE SVM KP SVM Ours(CV2) Ours (CV1)
DIA $76.42$ $76.58$ $76.56$ $76.74$ $79.87$ $78.86$
WBC $94.7$ $95.23$ $95.25$ $97.55$ $99.11$ $97.81$
Colon $87.46$ $92.03$ $92.52$ $96.57$ $85.00$ $90.00$
Dataset Fisher SVM FSV RFE SVM KP SVM Ours(CV2) Ours (CV1)
DIA $76.42$ $76.58$ $76.56$ $76.74$ $79.87$ $78.86$
WBC $94.7$ $95.23$ $95.25$ $97.55$ $99.11$ $97.81$
Colon $87.46$ $92.03$ $92.52$ $96.57$ $85.00$ $90.00$
Table 4.  Details of Datasets
Dataset num. of samples num. of features num. of class
$Leukemia_3c$ 72 7129 3
$Leukemia_4a$ 72 7129 4
DNA 2000 180 3
Vowel 528 10 11
Lung 32 56 3
Letter 15000 16 26
Dataset num. of samples num. of features num. of class
$Leukemia_3c$ 72 7129 3
$Leukemia_4a$ 72 7129 4
DNA 2000 180 3
Vowel 528 10 11
Lung 32 56 3
Letter 15000 16 26
[1]

Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2017, 2 (5) : 1-8. doi: 10.3934/bdia.2017020

[2]

Ying Hao, Fanwen Meng. A new method on gene selection for tissue classification. Journal of Industrial & Management Optimization, 2007, 3 (4) : 739-748. doi: 10.3934/jimo.2007.3.739

[3]

Mohamed A. Tawhid, Kevin B. Dsouza. Hybrid binary dragonfly enhanced particle swarm optimization algorithm for solving feature selection problems. Mathematical Foundations of Computing, 2018, 1 (2) : 181-200. doi: 10.3934/mfc.2018009

[4]

Hans Weinberger. The approximate controllability of a model for mutant selection. Evolution Equations & Control Theory, 2013, 2 (4) : 741-747. doi: 10.3934/eect.2013.2.741

[5]

Jonathan C. Mattingly, Etienne Pardoux. Invariant measure selection by noise. An example. Discrete & Continuous Dynamical Systems - A, 2014, 34 (10) : 4223-4257. doi: 10.3934/dcds.2014.34.4223

[6]

K. Schittkowski. Optimal parameter selection in support vector machines. Journal of Industrial & Management Optimization, 2005, 1 (4) : 465-476. doi: 10.3934/jimo.2005.1.465

[7]

Ke Ruan, Masao Fukushima. Robust portfolio selection with a combined WCVaR and factor model. Journal of Industrial & Management Optimization, 2012, 8 (2) : 343-362. doi: 10.3934/jimo.2012.8.343

[8]

Reinhard Bürger. A survey of migration-selection models in population genetics. Discrete & Continuous Dynamical Systems - B, 2014, 19 (4) : 883-959. doi: 10.3934/dcdsb.2014.19.883

[9]

Sebastian Bonhoeffer, Pia Abel zur Wiesch, Roger D. Kouyos. Rotating antibiotics does not minimize selection for resistance. Mathematical Biosciences & Engineering, 2010, 7 (4) : 919-922. doi: 10.3934/mbe.2010.7.919

[10]

Renato Bruni, Gianpiero Bianchi, Alessandra Reale. A combinatorial optimization approach to the selection of statistical units. Journal of Industrial & Management Optimization, 2016, 12 (2) : 515-527. doi: 10.3934/jimo.2016.12.515

[11]

P. Magal, G. F. Webb. Mutation, selection, and recombination in a model of phenotype evolution. Discrete & Continuous Dynamical Systems - A, 2000, 6 (1) : 221-236. doi: 10.3934/dcds.2000.6.221

[12]

Shaoyong Lai, Qichang Xie. A selection problem for a constrained linear regression model. Journal of Industrial & Management Optimization, 2008, 4 (4) : 757-766. doi: 10.3934/jimo.2008.4.757

[13]

Pierre-Emmanuel Jabin. Small populations corrections for selection-mutation models. Networks & Heterogeneous Media, 2012, 7 (4) : 805-836. doi: 10.3934/nhm.2012.7.805

[14]

Hanqing Jin, Xun Yu Zhou. Continuous-time portfolio selection under ambiguity. Mathematical Control & Related Fields, 2015, 5 (3) : 475-488. doi: 10.3934/mcrf.2015.5.475

[15]

Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang. A clustering based mate selection for evolutionary optimization. Big Data & Information Analytics, 2017, 2 (1) : 77-85. doi: 10.3934/bdia.2017010

[16]

Irina Kareva, Faina Berezovkaya, Georgy Karev. Mixed strategies and natural selection in resource allocation. Mathematical Biosciences & Engineering, 2013, 10 (5&6) : 1561-1586. doi: 10.3934/mbe.2013.10.1561

[17]

Xueting Cui, Xiaoling Sun, Dan Sha. An empirical study on discrete optimization models for portfolio selection. Journal of Industrial & Management Optimization, 2009, 5 (1) : 33-46. doi: 10.3934/jimo.2009.5.33

[18]

Yuan Lou, Thomas Nagylaki, Wei-Ming Ni. An introduction to migration-selection PDE models. Discrete & Continuous Dynamical Systems - A, 2013, 33 (10) : 4349-4373. doi: 10.3934/dcds.2013.33.4349

[19]

Li Xue, Hao Di. Uncertain portfolio selection with mental accounts and background risk. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-22. doi: 10.3934/jimo.2018124

[20]

Sheree L. Arpin, J. M. Cushing. Modeling frequency-dependent selection with an application to cichlid fish. Mathematical Biosciences & Engineering, 2008, 5 (4) : 889-903. doi: 10.3934/mbe.2008.5.889

 Impact Factor: 

Metrics

  • PDF downloads (13)
  • HTML views (139)
  • Cited by (0)

[Back to Top]