doi: 10.3934/bdia.2017017

A novel approach using incremental under sampling for data stream mining

1. 

Research Scholar, GITAM University, Telangana, Hyderabad, India

2. 

Sambalpur University Institute of Information Technology, Sambalpur, Orissa, India

Corresponding author:Anupama N, Research Scholar, Gitam University, Hyderabad.anupama.niranjan@gmail.com.

Published  February 2018

Data stream mining is every popular in recent years with advanced electronic devices generating continuous data streams. The performance of standard learning algorithms has been compromised with imbalance nature present in real world data streams. In this paper, we propose an algorithm known as Increment Under Sampling for Data streams (IUSDS) which uses an unique under sampling technique to almost balance the data sets to minimize the effect of imbalance in stream mining process. The experimental analysis conducted suggests that the proposed algorithm improves the knowledge discovery over benchmark algorithms like C4.5 and Hoeffding tree in terms of standard performance measures namely accuracy, AUC, precision, recall, F-measure, TP rate, FP rate and TN rate.

Citation: Anupama N, Sudarson Jena. A novel approach using incremental under sampling for data stream mining. Big Data & Information Analytics, doi: 10.3934/bdia.2017017
References:
[1]

J. Alcalá-FdezA. FernandezJ. LuengoJ. DerracS. GarcíaL. Sánchez and F. Herrera, KEEL data-mining software tool: Data set repository, Integration of Algorithms and Experimental Analysis Framework, Journal of Multiple-Valued Logic and Soft Computing, 17 (2011), 255-287.

[2]

A. Asuncion and D. J. Newman, UCI Repository of Machine Learning Database (School of Information and Computer Science), Irvine, CA: Univ. of California [Online], 2007. Available: http://www.ics.uci.edu/mlearn/MLRepository.html

[3]

I. Brown and C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 39 (2012), 3446-3453. doi: 10.1016/j.eswa.2011.09.033.

[4]

P. CaoD. Zhao and O. Zaiane, A PSO-based cost-sensitive neural network for imbalanced data classification, Trends and Applications in Knowledge Discovery and Data Mining, (2013), 452-463. doi: 10.1007/978-3-642-40319-4_39.

[5]

Y. Chen, Learning Classifiers from Imbalanced Only Positive and Unlabeled Data Sets 2008 UC San Diego Data Mining Contest.

[6]

Y. ChenS. TangL. ZhouC. WangJ. DuT. Wang and S. Pei, Decentralized Clustering by Finding Loose and Distributed Density Cores, Inform. Sci., 433/434 (2018), 510-526. doi: 10.1016/j.ins.2016.08.009.

[7]

Doucette and M. I. Heywood, Classification under imbalanced data sets:Active sub-sampling and auc approximation, M. O'Neill et al. Eds.:EuroGP 2008, LNCS, 4971 (2008), 266-277.

[8]

B. J. Frey and D. Dueck, Clustering by passing messages between data points, Science, 315 (2007), 972-976. doi: 10.1126/science.1136800.

[9]

G. Hulten, L. Spencer and P. Domingos, Mining time-changing data streams, In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, (2001), 97-106. doi: 10.1145/502512.502529.

[10]

A. K. Jain, Data clustering:50 years beyond K-means, Part of the Lecture Notes in Computer Science book series, 5211 (2008), 3-4. doi: 10.1007/978-3-540-87479-9_3.

[11]

R. Kohavi, Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid, In: Second International Conference on Knoledge Discovery and Data Mining, (1996), 202-207.

[12]

V. LópezI. TrigueroC. J. CarmonaS. García and F. Herrera, Addressing imbalanced classification withinstance generation techniques: IPADE-ID, Neurocomputing, 126 (2014), 15-28.

[13]

A. C. LorenaL. F. O. JacinthoM. F. SiqueiraR. De GiovanniL. G. LohmannA. C. P. L. F. de Carvalho and M. Yamamoto, Comparing machine learning classifiers in potential distribution modelling, Expert Systems with Applications, 38 (2011), 5268-5275. doi: 10.1016/j.eswa.2010.10.031.

[14]

H. Ma, Correlation-based Feature Subset Selection For Machine Learning PhD Thesis, 1998.

[15]

A. K. Menon, H. Narasimhan, S. Agarwal and S. Chawla, On the statistical consistency of algorithms for binary classification under class imbalance, Appearing in Proceedings of the 30 thInternational Conference on Machine Learning Atlanta, Georgia, USA, 2013.

[16]

A. Rodriguez and A. Laio, Clustering by fast search and find of density peaks, Science, 344 (2014), 1492-1496. doi: 10.1126/science.1242072.

[17]

N. VerbiestaE. RamentolC. Cornelisa and F. Herrera, Preprocessing noisy imbalanced datasets using SMOTE enhanced withfuzzy rough prototype selection, Applied Soft Computing, 22 (2014), 511-517.

[18]

S. WangL. L. Minku and X. Yao, Resampling-based ensemble methods for online class imbalance learning, IEEE Transactions on Knowledge and Data Engineering, 27 (2015), 1356-1368. doi: 10.1109/TKDE.2014.2345380.

[19]

I. H. Witten and E. Frank, Data mining:Practical machine learning tools and techniques, Newsletter: ACM SIGMOD Record Homepage Archive, 31 (2002), 76-77. doi: 10.1145/507338.507355.

[20]

B. Yang and L. Jing, A Novel nonparallel plane proximal svm for imbalance data classification Journal of Software, 9 2014.

show all references

References:
[1]

J. Alcalá-FdezA. FernandezJ. LuengoJ. DerracS. GarcíaL. Sánchez and F. Herrera, KEEL data-mining software tool: Data set repository, Integration of Algorithms and Experimental Analysis Framework, Journal of Multiple-Valued Logic and Soft Computing, 17 (2011), 255-287.

[2]

A. Asuncion and D. J. Newman, UCI Repository of Machine Learning Database (School of Information and Computer Science), Irvine, CA: Univ. of California [Online], 2007. Available: http://www.ics.uci.edu/mlearn/MLRepository.html

[3]

I. Brown and C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 39 (2012), 3446-3453. doi: 10.1016/j.eswa.2011.09.033.

[4]

P. CaoD. Zhao and O. Zaiane, A PSO-based cost-sensitive neural network for imbalanced data classification, Trends and Applications in Knowledge Discovery and Data Mining, (2013), 452-463. doi: 10.1007/978-3-642-40319-4_39.

[5]

Y. Chen, Learning Classifiers from Imbalanced Only Positive and Unlabeled Data Sets 2008 UC San Diego Data Mining Contest.

[6]

Y. ChenS. TangL. ZhouC. WangJ. DuT. Wang and S. Pei, Decentralized Clustering by Finding Loose and Distributed Density Cores, Inform. Sci., 433/434 (2018), 510-526. doi: 10.1016/j.ins.2016.08.009.

[7]

Doucette and M. I. Heywood, Classification under imbalanced data sets:Active sub-sampling and auc approximation, M. O'Neill et al. Eds.:EuroGP 2008, LNCS, 4971 (2008), 266-277.

[8]

B. J. Frey and D. Dueck, Clustering by passing messages between data points, Science, 315 (2007), 972-976. doi: 10.1126/science.1136800.

[9]

G. Hulten, L. Spencer and P. Domingos, Mining time-changing data streams, In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, (2001), 97-106. doi: 10.1145/502512.502529.

[10]

A. K. Jain, Data clustering:50 years beyond K-means, Part of the Lecture Notes in Computer Science book series, 5211 (2008), 3-4. doi: 10.1007/978-3-540-87479-9_3.

[11]

R. Kohavi, Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid, In: Second International Conference on Knoledge Discovery and Data Mining, (1996), 202-207.

[12]

V. LópezI. TrigueroC. J. CarmonaS. García and F. Herrera, Addressing imbalanced classification withinstance generation techniques: IPADE-ID, Neurocomputing, 126 (2014), 15-28.

[13]

A. C. LorenaL. F. O. JacinthoM. F. SiqueiraR. De GiovanniL. G. LohmannA. C. P. L. F. de Carvalho and M. Yamamoto, Comparing machine learning classifiers in potential distribution modelling, Expert Systems with Applications, 38 (2011), 5268-5275. doi: 10.1016/j.eswa.2010.10.031.

[14]

H. Ma, Correlation-based Feature Subset Selection For Machine Learning PhD Thesis, 1998.

[15]

A. K. Menon, H. Narasimhan, S. Agarwal and S. Chawla, On the statistical consistency of algorithms for binary classification under class imbalance, Appearing in Proceedings of the 30 thInternational Conference on Machine Learning Atlanta, Georgia, USA, 2013.

[16]

A. Rodriguez and A. Laio, Clustering by fast search and find of density peaks, Science, 344 (2014), 1492-1496. doi: 10.1126/science.1242072.

[17]

N. VerbiestaE. RamentolC. Cornelisa and F. Herrera, Preprocessing noisy imbalanced datasets using SMOTE enhanced withfuzzy rough prototype selection, Applied Soft Computing, 22 (2014), 511-517.

[18]

S. WangL. L. Minku and X. Yao, Resampling-based ensemble methods for online class imbalance learning, IEEE Transactions on Knowledge and Data Engineering, 27 (2015), 1356-1368. doi: 10.1109/TKDE.2014.2345380.

[19]

I. H. Witten and E. Frank, Data mining:Practical machine learning tools and techniques, Newsletter: ACM SIGMOD Record Homepage Archive, 31 (2002), 76-77. doi: 10.1145/507338.507355.

[20]

B. Yang and L. Jing, A Novel nonparallel plane proximal svm for imbalance data classification Journal of Software, 9 2014.

Figure 1.  Trends in TN Rate for C4.5, Hoeffding Tree versus IUSDS on data stream
Figure 2.  Trends in FP Rate for C4.5, Hoeffding Tree versus IUSDS on data stream
Table 1.  Details of the Dataset
S.no Dataset symbol Instances Majority Minority IR
1. Breast-cancer B1 286 201 85 2.36
2. Breast-w B2 699 458 241 1.90
3. Colic C1 368 232 136 1.71
4. Credit-g C2 1,000 700 300 2.33
5. Diabetes D1 768 500 268 1.87
6. Heart-c H1 303 165 138 1.19
7. Heart-h H2 294 188 10 1.77
8. Heart-stat H3 270 150 120 1.25
9. Hepatitis H4 155 123 32 3.85
10. Ionosphere I1 351 225 126 1.79
11. Kr-vs-kp K1 3196 1669 1527 1.09
12. Labor L1 57 37 20 1.85
13. Mushroom M1 8124 4208 3916 1.08
14. Sick S1 3772 3541 231 15.32
15. Sonar S2 208 111 97 1.15
S.no Dataset symbol Instances Majority Minority IR
1. Breast-cancer B1 286 201 85 2.36
2. Breast-w B2 699 458 241 1.90
3. Colic C1 368 232 136 1.71
4. Credit-g C2 1,000 700 300 2.33
5. Diabetes D1 768 500 268 1.87
6. Heart-c H1 303 165 138 1.19
7. Heart-h H2 294 188 10 1.77
8. Heart-stat H3 270 150 120 1.25
9. Hepatitis H4 155 123 32 3.85
10. Ionosphere I1 351 225 126 1.79
11. Kr-vs-kp K1 3196 1669 1527 1.09
12. Labor L1 57 37 20 1.85
13. Mushroom M1 8124 4208 3916 1.08
14. Sick S1 3772 3541 231 15.32
15. Sonar S2 208 111 97 1.15
Table 2.  Data Stream Description
Dataset Instances Majority Minority IR
Chunk 1:{B1} 286 201 85 2.36
Chunk 2:{B1, B2} 985 659 326 2.02
Chunk 3:{B1, B2, C1} 1353 891 462 1.92
Chunk 4:{B1, B2, C1, C2} 2353 1591 1062 1.49
Chunk 5:{B1, B2, C1, C2, D1} 3121 2091 1325 1.57
Chunk 6:{B1, B2, C1, C2, D1, H1} 3424 2256 1463 1.52
Chunk 7:{B1, B2, C1, C2, D1, H1, H2} 3718 2444 1569 1.55
Chunk 8:{B1, B2, C1, C2, D1, H1, H2, H3} 3988 2594 1689 1.53
Chunk 9:{B1, B2, C1, C2, D1, H1, H2, H3, H4} 4143 2717 1721 1.57
Chunk 10:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1} 4494 2942 1847 1.59
Chunk 11:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1} 7690 4611 3374 1.36
Chunk 12:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1} 7747 4648 3394 1.36
Chunk 13:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1, M1} 15871 8856 7310 1.21
Chunk 14:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1, M1, S1} 19643 12397 7541 1.64
Chunk 15:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1, M1, S1, S2} 19851 12508 7638 1.63
Dataset Instances Majority Minority IR
Chunk 1:{B1} 286 201 85 2.36
Chunk 2:{B1, B2} 985 659 326 2.02
Chunk 3:{B1, B2, C1} 1353 891 462 1.92
Chunk 4:{B1, B2, C1, C2} 2353 1591 1062 1.49
Chunk 5:{B1, B2, C1, C2, D1} 3121 2091 1325 1.57
Chunk 6:{B1, B2, C1, C2, D1, H1} 3424 2256 1463 1.52
Chunk 7:{B1, B2, C1, C2, D1, H1, H2} 3718 2444 1569 1.55
Chunk 8:{B1, B2, C1, C2, D1, H1, H2, H3} 3988 2594 1689 1.53
Chunk 9:{B1, B2, C1, C2, D1, H1, H2, H3, H4} 4143 2717 1721 1.57
Chunk 10:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1} 4494 2942 1847 1.59
Chunk 11:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1} 7690 4611 3374 1.36
Chunk 12:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1} 7747 4648 3394 1.36
Chunk 13:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1, M1} 15871 8856 7310 1.21
Chunk 14:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1, M1, S1} 19643 12397 7541 1.64
Chunk 15:{B1, B2, C1, C2, D1, H1, H2, H3, H4, I1, K1, L1, M1, S1, S2} 19851 12508 7638 1.63
Table 3.  Average TN Rate for IUSDS verses C4.5 during the 11 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.260$\bullet$ 0.395$\bullet$ 0.325
Chunk 2 (maj=659; min=326) 0.596$\bullet$ 0.685$\bullet$ 0.652
Chunk 3 (maj=891; min=462) 0.636$\bullet$ 0.693$\bullet$ 0.689
Chunk 4 (maj=1591; min=762) 0.577$\bullet$ 0.642$\bullet$ 0.624
Chunk 5 (maj=2091; min=1030) 0.582$\bullet$ 0.634$\bullet$ 0.638
Chunk 6 (maj=2214; min=1062) 0.635$\bullet$ 0.674$\bullet$ 0.685
Chunk 7 (maj=2439; min=1188) 0.679$\bullet$ 0.707$\bullet$ 0.723
Chunk 8 (maj=2476; min=1208) 0.702$\bullet$ 0.738$\bullet$ 0.740
Chunk 9 (maj=6017; min=1438) 0.721$\bullet$ 0.667$\bullet$ 0.759
Chunk 10 (maj=6128; min=1536) 0.724$\bullet$ 0.657$\bullet$ 0.757
Chunk 11 (maj=6395; min=1704) 0.745$\bullet$ 0.684$\bullet$ 0.778
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.260$\bullet$ 0.395$\bullet$ 0.325
Chunk 2 (maj=659; min=326) 0.596$\bullet$ 0.685$\bullet$ 0.652
Chunk 3 (maj=891; min=462) 0.636$\bullet$ 0.693$\bullet$ 0.689
Chunk 4 (maj=1591; min=762) 0.577$\bullet$ 0.642$\bullet$ 0.624
Chunk 5 (maj=2091; min=1030) 0.582$\bullet$ 0.634$\bullet$ 0.638
Chunk 6 (maj=2214; min=1062) 0.635$\bullet$ 0.674$\bullet$ 0.685
Chunk 7 (maj=2439; min=1188) 0.679$\bullet$ 0.707$\bullet$ 0.723
Chunk 8 (maj=2476; min=1208) 0.702$\bullet$ 0.738$\bullet$ 0.740
Chunk 9 (maj=6017; min=1438) 0.721$\bullet$ 0.667$\bullet$ 0.759
Chunk 10 (maj=6128; min=1536) 0.724$\bullet$ 0.657$\bullet$ 0.757
Chunk 11 (maj=6395; min=1704) 0.745$\bullet$ 0.684$\bullet$ 0.778
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 4.  Average Accuracy for IUSDS verses C4.5 during the 11 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 74.28$\circ$ 72.18$\circ$ 71.73
Chunk 2 (maj=659; min=326) 84.64$\bullet$ 84.09$\bullet$ 84.94
Chunk 3 (maj=891; min=462) 84.81$\circ$ 81.90$\bullet$ 84.03
Chunk 4 (maj=1591; min=762) 81.42$\circ$ 80.19$\circ$ 79.10
Chunk 5 (maj=2091; min=1030) 80.04$\circ$ 79.30$\circ$ 79.26
Chunk 6 (maj=2214; min=1062) 79.90$\circ$ 79.86$\circ$ 79.59
Chunk 7 (maj=2439; min=1188) 81.31$\circ$ 81.06$\bullet$ 81.23
Chunk 8 (maj=2476; min=1208) 80.97$\bullet$ 82.15$\circ$ 81.01
Chunk 9 (maj=6017; min=1438) 82.94 83.44$\circ$ 82.94
Chunk 10 (maj=6128; min=1536) 82.01$\bullet$ 81.92$\bullet$ 82.17
Chunk 11 (maj=6395; min=1704) 83.33$\bullet$ 83.11$\bullet$ 83.52
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 74.28$\circ$ 72.18$\circ$ 71.73
Chunk 2 (maj=659; min=326) 84.64$\bullet$ 84.09$\bullet$ 84.94
Chunk 3 (maj=891; min=462) 84.81$\circ$ 81.90$\bullet$ 84.03
Chunk 4 (maj=1591; min=762) 81.42$\circ$ 80.19$\circ$ 79.10
Chunk 5 (maj=2091; min=1030) 80.04$\circ$ 79.30$\circ$ 79.26
Chunk 6 (maj=2214; min=1062) 79.90$\circ$ 79.86$\circ$ 79.59
Chunk 7 (maj=2439; min=1188) 81.31$\circ$ 81.06$\bullet$ 81.23
Chunk 8 (maj=2476; min=1208) 80.97$\bullet$ 82.15$\circ$ 81.01
Chunk 9 (maj=6017; min=1438) 82.94 83.44$\circ$ 82.94
Chunk 10 (maj=6128; min=1536) 82.01$\bullet$ 81.92$\bullet$ 82.17
Chunk 11 (maj=6395; min=1704) 83.33$\bullet$ 83.11$\bullet$ 83.52
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 5.  Average FP Rate for IUSDS verses C4.5 during the 11 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.740$\bullet$ 0.605$\bullet$ 0.675
Chunk 2 (maj=659; min=326) 0.404$\bullet$ 0.315$\bullet$ 0.348
Chunk 3 (maj=891; min=462) 0.364$\bullet$ 0.307$\bullet$ 0.311
Chunk 4 (maj=1591; min=762) 0.423$\bullet$ 0.358$\bullet$ 0.376
Chunk 5 (maj=2091; min=1030) 0.418$\bullet$ 0.366$\bullet$ 0.362
Chunk 6 (maj=2214; min=1062) 0.365$\bullet$ 0.326$\bullet$ 0.315
Chunk 7 (maj=2439; min=1188) 0.321$\bullet$ 0.293$\bullet$ 0.277
Chunk 8 (maj=2476; min=1208) 0.298$\bullet$ 0.262$\bullet$ 0.260
Chunk 9 (maj=6017; min=1438) 0.279$\bullet$ 0.333$\bullet$ 0.241
Chunk 10 (maj=6128; min=1536) 0.276$\bullet$ 0.343$\bullet$ 0.243
Chunk 11 (maj=6395; min=1704) 0.255$\bullet$ 0.316$\bullet$ 0.222
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.740$\bullet$ 0.605$\bullet$ 0.675
Chunk 2 (maj=659; min=326) 0.404$\bullet$ 0.315$\bullet$ 0.348
Chunk 3 (maj=891; min=462) 0.364$\bullet$ 0.307$\bullet$ 0.311
Chunk 4 (maj=1591; min=762) 0.423$\bullet$ 0.358$\bullet$ 0.376
Chunk 5 (maj=2091; min=1030) 0.418$\bullet$ 0.366$\bullet$ 0.362
Chunk 6 (maj=2214; min=1062) 0.365$\bullet$ 0.326$\bullet$ 0.315
Chunk 7 (maj=2439; min=1188) 0.321$\bullet$ 0.293$\bullet$ 0.277
Chunk 8 (maj=2476; min=1208) 0.298$\bullet$ 0.262$\bullet$ 0.260
Chunk 9 (maj=6017; min=1438) 0.279$\bullet$ 0.333$\bullet$ 0.241
Chunk 10 (maj=6128; min=1536) 0.276$\bullet$ 0.343$\bullet$ 0.243
Chunk 11 (maj=6395; min=1704) 0.255$\bullet$ 0.316$\bullet$ 0.222
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 6.  Average AUC for IUSDS verses C4.5 during the 11 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.606$\bullet$ 0.683$\circ$ 0.637
Chunk 2 (maj=659; min=326) 0.782$\bullet$ 0.836$\circ$ 0.812
Chunk 3 (maj=891; min=462) 0.802$\bullet$ 0.832$\bullet$ 0.833
Chunk 4 (maj=1591; min=762) 0.764$\bullet$ 0.820$\circ$ 0.777
Chunk 5 (maj=2091; min=1030) 0.761$\bullet$ 0.818$\circ$ 0.787
Chunk 6 (maj=2214; min=1062) 0.746$\bullet$ 0.819$\circ$ 0.775
Chunk 7 (maj=2439; min=1188) 0.766$\bullet$ 0.836$\circ$ 0.795
Chunk 8 (maj=2476; min=1208) 0.761$\bullet$ 0.845$\circ$ 0.791
Chunk 9 (maj=6017; min=1438) 0.782$\bullet$ 0.813$\circ$ 0.810
Chunk 10 (maj=6128; min=1536) 0.779$\bullet$ 0.812$\circ$ 0.806
Chunk 11 (maj=6395; min=1704) 0.798$\bullet$ 0.826$\circ$ 0.821
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.606$\bullet$ 0.683$\circ$ 0.637
Chunk 2 (maj=659; min=326) 0.782$\bullet$ 0.836$\circ$ 0.812
Chunk 3 (maj=891; min=462) 0.802$\bullet$ 0.832$\bullet$ 0.833
Chunk 4 (maj=1591; min=762) 0.764$\bullet$ 0.820$\circ$ 0.777
Chunk 5 (maj=2091; min=1030) 0.761$\bullet$ 0.818$\circ$ 0.787
Chunk 6 (maj=2214; min=1062) 0.746$\bullet$ 0.819$\circ$ 0.775
Chunk 7 (maj=2439; min=1188) 0.766$\bullet$ 0.836$\circ$ 0.795
Chunk 8 (maj=2476; min=1208) 0.761$\bullet$ 0.845$\circ$ 0.791
Chunk 9 (maj=6017; min=1438) 0.782$\bullet$ 0.813$\circ$ 0.810
Chunk 10 (maj=6128; min=1536) 0.779$\bullet$ 0.812$\circ$ 0.806
Chunk 11 (maj=6395; min=1704) 0.798$\bullet$ 0.826$\circ$ 0.821
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 7.  Average Precision for IUSDS verses C4.5 during the 11 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.753$\circ$ 0.774$\bullet$ 0.736
Chunk 2 (maj=659; min=326) 0.859$\bullet$ 0.881$\bullet$ 0.861
Chunk 3 (maj=891; min=462) 0.856$\bullet$ 0.866$\bullet$ 0.836
Chunk 4 (maj=1591; min=762) 0.834$\bullet$ 0.849$\bullet$ 0.800
Chunk 5 (maj=2091; min=1030) 0.827$\bullet$ 0.839$\bullet$ 0.802
Chunk 6 (maj=2214; min=1062) 0.774$\bullet$ 0.795 0.785
Chunk 7 (maj=2439; min=1188) 0.791$\bullet$ 0.802$\circ$ 0.804
Chunk 8 (maj=2476; min=1208) 0.779$\bullet$ 0.811$\circ$ 0.798
Chunk 9 (maj=6017; min=1438) 0.803$\bullet$ 0.826$\bullet$ 0.819
Chunk 10 (maj=6128; min=1536) 0.795$\bullet$ 0.807$\bullet$ 0.813
Chunk 11 (maj=6395; min=1704) 0.811$\bullet$ 0.822$\bullet$ 0.828
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.753$\circ$ 0.774$\bullet$ 0.736
Chunk 2 (maj=659; min=326) 0.859$\bullet$ 0.881$\bullet$ 0.861
Chunk 3 (maj=891; min=462) 0.856$\bullet$ 0.866$\bullet$ 0.836
Chunk 4 (maj=1591; min=762) 0.834$\bullet$ 0.849$\bullet$ 0.800
Chunk 5 (maj=2091; min=1030) 0.827$\bullet$ 0.839$\bullet$ 0.802
Chunk 6 (maj=2214; min=1062) 0.774$\bullet$ 0.795 0.785
Chunk 7 (maj=2439; min=1188) 0.791$\bullet$ 0.802$\circ$ 0.804
Chunk 8 (maj=2476; min=1208) 0.779$\bullet$ 0.811$\circ$ 0.798
Chunk 9 (maj=6017; min=1438) 0.803$\bullet$ 0.826$\bullet$ 0.819
Chunk 10 (maj=6128; min=1536) 0.795$\bullet$ 0.807$\bullet$ 0.813
Chunk 11 (maj=6395; min=1704) 0.811$\bullet$ 0.822$\bullet$ 0.828
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 8.  Average Recall for IUSDS verses C4.5 during the 15 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.947$\circ$ 0.860$\bullet$ 0.909
Chunk 2 (maj=659; min=326) 0.953$\circ$ 0.906$\bullet$ 0.946
Chunk 3 (maj=891; min=462) 0.946$\circ$ 0.875$\bullet$ 0.925
Chunk 4 (maj=1591; min=762) 0.921 0.872$\bullet$ 0.888
Chunk 5 (maj=2091; min=1030) 0.901$\bullet$ 0.866$\bullet$ 0.884
Chunk 6 (maj=2214; min=1062) 0.813$\bullet$ 0.828$\bullet$ 0.820
Chunk 7 (maj=2439; min=1188) 0.814 0.830$\bullet$ 0.824
Chunk 8 (maj=2476; min=1208) 0.793 0.826$\bullet$ 0.808
Chunk 9 (maj=6017; min=1438) 0.815$\bullet$ 0.844$\bullet$ 0.828
Chunk 10 (maj=6128; min=1536) 0.806$\bullet$ 0.841$\bullet$ 0.822
Chunk 11 (maj=6395; min=1704) 0.821$\bullet$ 0.851$\bullet$ 0.833
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.947$\circ$ 0.860$\bullet$ 0.909
Chunk 2 (maj=659; min=326) 0.953$\circ$ 0.906$\bullet$ 0.946
Chunk 3 (maj=891; min=462) 0.946$\circ$ 0.875$\bullet$ 0.925
Chunk 4 (maj=1591; min=762) 0.921 0.872$\bullet$ 0.888
Chunk 5 (maj=2091; min=1030) 0.901$\bullet$ 0.866$\bullet$ 0.884
Chunk 6 (maj=2214; min=1062) 0.813$\bullet$ 0.828$\bullet$ 0.820
Chunk 7 (maj=2439; min=1188) 0.814 0.830$\bullet$ 0.824
Chunk 8 (maj=2476; min=1208) 0.793 0.826$\bullet$ 0.808
Chunk 9 (maj=6017; min=1438) 0.815$\bullet$ 0.844$\bullet$ 0.828
Chunk 10 (maj=6128; min=1536) 0.806$\bullet$ 0.841$\bullet$ 0.822
Chunk 11 (maj=6395; min=1704) 0.821$\bullet$ 0.851$\bullet$ 0.833
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 9.  Average F-measure for IUSDS verses C4.5 during the 11 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.838$\circ$ 0.812$\bullet$ 0.812
Chunk 2 (maj=659; min=326) 0.900$\bullet$ 0.890$\bullet$ 0.898
Chunk 3 (maj=891; min=462) 0.896 0.867$\bullet$ 0.874
Chunk 4 (maj=1591; min=762) 0.873$\bullet$ 0.857$\bullet$ 0.838
Chunk 5 (maj=2091; min=1030) 0.860$\bullet$ 0.849$\bullet$ 0.838
Chunk 6 (maj=2214; min=1062) 0.785$\bullet$ 0.803$\bullet$ 0.791
Chunk 7 (maj=2439; min=1188) 0.794$\bullet$ 0.808$\bullet$ 0.804
Chunk 8 (maj=2476; min=1208) 0.774$\bullet$ 0.810$\circ$ 0.790
Chunk 9 (maj=6017; min=1438) 0.798$\bullet$ 0.827$\bullet$ 0.813
Chunk 10 (maj=6128; min=1536) 0.790$\bullet$ 0.815$\bullet$ 0.807
Chunk 11 (maj=6395; min=1704) 0.807$\bullet$ 0.828$\bullet$ 0.821
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.838$\circ$ 0.812$\bullet$ 0.812
Chunk 2 (maj=659; min=326) 0.900$\bullet$ 0.890$\bullet$ 0.898
Chunk 3 (maj=891; min=462) 0.896 0.867$\bullet$ 0.874
Chunk 4 (maj=1591; min=762) 0.873$\bullet$ 0.857$\bullet$ 0.838
Chunk 5 (maj=2091; min=1030) 0.860$\bullet$ 0.849$\bullet$ 0.838
Chunk 6 (maj=2214; min=1062) 0.785$\bullet$ 0.803$\bullet$ 0.791
Chunk 7 (maj=2439; min=1188) 0.794$\bullet$ 0.808$\bullet$ 0.804
Chunk 8 (maj=2476; min=1208) 0.774$\bullet$ 0.810$\circ$ 0.790
Chunk 9 (maj=6017; min=1438) 0.798$\bullet$ 0.827$\bullet$ 0.813
Chunk 10 (maj=6128; min=1536) 0.790$\bullet$ 0.815$\bullet$ 0.807
Chunk 11 (maj=6395; min=1704) 0.807$\bullet$ 0.828$\bullet$ 0.821
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 10.  Average FN Rate for IUSDS verses C4.5 during the 11 time stamps after each status change for chunk-by-chunk learning
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.053$\circ$ 0.140$\circ$ 0.091
Chunk 2 (maj=659; min=326) 0.047$\circ$ 0.094$\circ$ 0.054
Chunk 3 (maj=891; min=462) 0.054$\circ$ 0.125$\bullet$ 0.075
Chunk 4 (maj=1591; min=762) 0.079 0.128$\bullet$ 0.112
Chunk 5 (maj=2091; min=1030) 0.099$\circ$ 0.134$\bullet$ 0.116
Chunk 6 (maj=2214; min=1062) 0.187$\circ$ 0.172$\bullet$ 0.180
Chunk 7 (maj=2439; min=1188) 0.186 0.170$\bullet$ 0.176
Chunk 8 (maj=2476; min=1208) 0.207 0.174$\bullet$ 0.192
Chunk 9 (maj=6017; min=1438) 0.185$\circ$ 0.156$\bullet$ 0.172
Chunk 10 (maj=6128; min=1536) 0.194$\circ$ 0.159$\bullet$ 0.178
Chunk 11 (maj=6395; min=1704) 0.179$\circ$ 0.149$\circ$ 0.167
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Chunk no C4.5 HoeffdingTree IUSDS
Chunk 1 (maj=201; min=85) 0.053$\circ$ 0.140$\circ$ 0.091
Chunk 2 (maj=659; min=326) 0.047$\circ$ 0.094$\circ$ 0.054
Chunk 3 (maj=891; min=462) 0.054$\circ$ 0.125$\bullet$ 0.075
Chunk 4 (maj=1591; min=762) 0.079 0.128$\bullet$ 0.112
Chunk 5 (maj=2091; min=1030) 0.099$\circ$ 0.134$\bullet$ 0.116
Chunk 6 (maj=2214; min=1062) 0.187$\circ$ 0.172$\bullet$ 0.180
Chunk 7 (maj=2439; min=1188) 0.186 0.170$\bullet$ 0.176
Chunk 8 (maj=2476; min=1208) 0.207 0.174$\bullet$ 0.192
Chunk 9 (maj=6017; min=1438) 0.185$\circ$ 0.156$\bullet$ 0.172
Chunk 10 (maj=6128; min=1536) 0.194$\circ$ 0.159$\bullet$ 0.178
Chunk 11 (maj=6395; min=1704) 0.179$\circ$ 0.149$\circ$ 0.167
$\bullet$ Bold dot indicates the win of IUSDS; $\circ$ Empty dot indicates the loss of IUSDS
Table 11.  Summary of experimental results for IUSDS
Results Systems Wins Ties Losses
TN Rate IUSDS v/s C4.5 11 0 0
IUSDS v/s HoeffdingTree 11 0 0
Accuracy IUSDS v/s C4.5 04 1 6
IUSDS v/s HoeffdingTree 05 0 6
FP Rate IUSDS v/s C4.5 11 0 0
IUSDS v/s HoeffdingTree 11 0 0
AUC IUSDS v/s C4.5 11 0 0
IUSDS v/s HoeffdingTree 1 0 10
Precision IUSDS v/s C4.5 10 0 1
IUSDS v/s HoeffdingTree 08 1 02
Recall IUSDS v/s C4.5 05 03 03
IUSDS v/s HoeffdingTree 11 0 0
F-measure IUSDS v/s C4.5 09 01 01
IUSDS v/s HoeffdingTree 10 00 01
FN Rate IUSDS v/s C4.5 05 03 03
IUSDS v/s HoeffdingTree 11 00 00
Results Systems Wins Ties Losses
TN Rate IUSDS v/s C4.5 11 0 0
IUSDS v/s HoeffdingTree 11 0 0
Accuracy IUSDS v/s C4.5 04 1 6
IUSDS v/s HoeffdingTree 05 0 6
FP Rate IUSDS v/s C4.5 11 0 0
IUSDS v/s HoeffdingTree 11 0 0
AUC IUSDS v/s C4.5 11 0 0
IUSDS v/s HoeffdingTree 1 0 10
Precision IUSDS v/s C4.5 10 0 1
IUSDS v/s HoeffdingTree 08 1 02
Recall IUSDS v/s C4.5 05 03 03
IUSDS v/s HoeffdingTree 11 0 0
F-measure IUSDS v/s C4.5 09 01 01
IUSDS v/s HoeffdingTree 10 00 01
FN Rate IUSDS v/s C4.5 05 03 03
IUSDS v/s HoeffdingTree 11 00 00
[1]

Subrata Dasgupta. Disentangling data, information and knowledge. Big Data & Information Analytics, 2016, 1 (4) : 377-389. doi: 10.3934/bdia.2016016

[2]

Alexandre J. Chorin, Fei Lu, Robert N. Miller, Matthias Morzfeld, Xuemin Tu. Sampling, feasibility, and priors in data assimilation. Discrete & Continuous Dynamical Systems - A, 2016, 36 (8) : 4227-4246. doi: 10.3934/dcds.2016.36.4227

[3]

Jingzhi Li, Jun Zou. A direct sampling method for inverse scattering using far-field data. Inverse Problems & Imaging, 2013, 7 (3) : 757-775. doi: 10.3934/ipi.2013.7.757

[4]

Andrea Cianchi, Vladimir Maz'ya. Global gradient estimates in elliptic problems under minimal data and domain regularity. Communications on Pure & Applied Analysis, 2015, 14 (1) : 285-311. doi: 10.3934/cpaa.2015.14.285

[5]

Yoshiaki Inoue, Tetsuya Takine. The FIFO single-server queue with disasters and multiple Markovian arrival streams. Journal of Industrial & Management Optimization, 2014, 10 (1) : 57-87. doi: 10.3934/jimo.2014.10.57

[6]

Stefano Galatolo. Orbit complexity and data compression. Discrete & Continuous Dynamical Systems - A, 2001, 7 (3) : 477-486. doi: 10.3934/dcds.2001.7.477

[7]

Alessia Marigo. Equilibria for data networks. Networks & Heterogeneous Media, 2007, 2 (3) : 497-528. doi: 10.3934/nhm.2007.2.497

[8]

Minlong Lin, Ke Tang. Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data & Information Analytics, 2017, 2 (1) : 1-21. doi: 10.3934/bdia.2017005

[9]

Yunmei Chen, Xiaojing Ye, Feng Huang. A novel method and fast algorithm for MR image reconstruction with significantly under-sampled data. Inverse Problems & Imaging, 2010, 4 (2) : 223-240. doi: 10.3934/ipi.2010.4.223

[10]

Xiaosheng Li, Gunther Uhlmann. Inverse problems with partial data in a slab. Inverse Problems & Imaging, 2010, 4 (3) : 449-462. doi: 10.3934/ipi.2010.4.449

[11]

Richard Boire. Understanding AI in a world of big data. Big Data & Information Analytics, 2017, 2 (5) : 22-42. doi: 10.3934/bdia.2018001

[12]

Ida De Bonis, Daniela Giachetti. Singular parabolic problems with possibly changing sign data. Discrete & Continuous Dynamical Systems - B, 2014, 19 (7) : 2047-2064. doi: 10.3934/dcdsb.2014.19.2047

[13]

Sylvain Ervedoza, Enrique Zuazua. A systematic method for building smooth controls for smooth data. Discrete & Continuous Dynamical Systems - B, 2010, 14 (4) : 1375-1401. doi: 10.3934/dcdsb.2010.14.1375

[14]

Zhouchen Lin. A review on low-rank models in data analysis. Big Data & Information Analytics, 2016, 1 (2&3) : 139-161. doi: 10.3934/bdia.2016001

[15]

Z. G. Feng, Kok Lay Teo, N. U. Ahmed, Yulin Zhao, W. Y. Yan. Optimal fusion of sensor data for Kalman filtering. Discrete & Continuous Dynamical Systems - A, 2006, 14 (3) : 483-503. doi: 10.3934/dcds.2006.14.483

[16]

Liu Hui, Lin Zhi, Waqas Ahmad. Network(graph) data research in the coordinate system. Mathematical Foundations of Computing, 2018, 1 (1) : 1-10. doi: 10.3934/mfc.2018001

[17]

Pedro Caro. On an inverse problem in electromagnetism with local data: stability and uniqueness. Inverse Problems & Imaging, 2011, 5 (2) : 297-322. doi: 10.3934/ipi.2011.5.297

[18]

Francesca Sapuppo, Elena Umana, Mattia Frasca, Manuela La Rosa, David Shannahoff-Khalsa, Luigi Fortuna, Maide Bucolo. Complex spatio-temporal features in meg data. Mathematical Biosciences & Engineering, 2006, 3 (4) : 697-716. doi: 10.3934/mbe.2006.3.697

[19]

Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen. Big data collection and analysis for manufacturing organisations. Big Data & Information Analytics, 2017, 2 (2) : 127-139. doi: 10.3934/bdia.2017002

[20]

Débora A. F. Albanez, Maicon J. Benvenutti. Continuous data assimilation algorithm for simplified Bardina model. Evolution Equations & Control Theory, 2018, 7 (1) : 33-52. doi: 10.3934/eect.2018002

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]