A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms

şevgin, hikmet

doi:10.21449/ijate.1167705

A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms

Hikmet ŞEVGİN (Van Yüzüncü Yıl Üniversitesi, Eğitim Fakültesi, Eğitim Bilimleri Bölümü, Van, Türkiye)

International Journal of Assessment Tools in Education

0 0

Yıl: 2023 Cilt: 10 Sayı: 3 Sayfa Aralığı: 544 - 562 Metin Dili: İngilizce DOI: 10.21449/ijate.1167705 İndeks Tarihi: 29-09-2023

A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms

Öz:

This study aims to conduct a comparative study of Bagging and Boosting algorithms among ensemble methods and to compare the classification performance of TreeNet and Random Forest methods using these algorithms on the data extracted from ABİDE application in education. The main factor in choosing them for analyses is that they are Ensemble methods combining decision trees via Bagging and Boosting algorithms and creating a single outcome by combining the outputs obtained from each of them. The data set consists of mathematics scores of ABİDE (Academic Skills Monitoring and Evaluation) 2016 implementation and various demographic variables regarding students. The study group involves 5000 students randomly recruited. On the deletion of loss data and assignment procedures, this number decreased to 4568. The analyses showed that the TreeNet method performed more successfully in terms of classification accuracy, sensitivity, F1-score and AUC value based on sample size, and the Random Forest method on specificity and accuracy. It can be alleged that the TreeNet method is more successful in all numerical estimation error rates for each sample size by producing lower values compared to the Random Forest method. When comparing both analysis methods based on ABİDE data, considering all the conditions, including sample size, cross validity and performance criteria following the analyses, TreeNet can be said to exhibit higher classification performance than Random Forest. Unlike a single classifier or predictive method, the classification or prediction of multiple methods by using Boosting and Bagging algorithms is considered important for the results obtained in education.

Anahtar Kelime: Educational Data Mining Ensemble Learning Bagging Boosting

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık

Abdar, M., Zomorodi-Moghadam, M., & Zhou, X. (2018, 12-14, November). An ensemble-based decision tree approach for educational data mining [Conference presentation]. In 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC), Kaohsiung, Taiwan. https://doi.org/10.1109/BESC.2018.8697318
Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., & Saeys, Y. (2010). Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 26(3). 392-398. https://doi.org/10.1093/bioinformatics/btp630
Abidi, S.M.R., Zhang, W., Haidery, S.A., Rizvi, S.S., Riaz, R., Ding, H., & Kwon, S.J. (2020). Educational sustainability through big data assimilation to quantify academic procrastination using ensemble classifiers. Sustainability, 12(15), 6074. https://doi.org/10.3390/su12156074
Aggarwal, D., Mittal, S., & Bali, V. (2021). Significance of non-academic parameters for predicting student performance using ensemble learning techniques. International Journal of System Dynamics Applications, 10(3), 38 49. https://doi.org/10.4018/IJSDA.2021070103
Akman, M. (2010). An overview of data mining techniques and analysis of Random Forests method: An application on medical field [Unpublished master’s thesis]. Ankara University.
Almasri, A., Celebi, E., & Alkhawaldeh, R.S. (2019). EMT: Ensemble meta-based tree model for predicting student performance. Hindawi, 1 13. https://doi.org/10.1155/2019/3610248
Amrieh, E.A., Hamtini, T., & Aljarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application, 9(8), 119-136. http://dx.doi.org/10.14257/ijdta.2016.9.8.13
Ashraf, M., Zaman, M., & Ahmed, M. (2020). An intelligent prediction system for educational data mining based on ensemble and filtering approaches. Procedia Computer Science, 167, 1471-1483. https://doi.org/10.1016/j.procs.2020.03.358
Ashraf, M., Salal, Y.K., & Abdullaev, S.M. (2021). Educational Data Mining Using Base (Individual) and Ensemble Learning Approaches to Predict the Performance of Students. In Data Science. Springer. https://doi.org/10.1007/978-981-16-1681-5_2
Arun, D.K., Namratha, V., Ramyashree, B.V., Jain, Y.P., & Choudhury, A.R. (2021, 27-29, January). Student academic performance prediction using educational data mining [Conference presentation]. In 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India. https://doi.org/10.1109/ICCCI50826.2021.9457021
Baskin, I.I., Marcou, G., Horvath, D., & Varnek, A. (2017a). Bagging and boosting of classification models. Tutorials in Chemoinformatics, 241 247. John Wiley & Sons Ltd. https://doi.org/10.1002/9781119161110.ch15
Baskin, I.I., Marcou, G., Horvath, D., & Varnek, A. (2017b). Bagging and boosting of regression models. Tutorials in Chemoinformatics, 249-255. John Wiley & Sons Ltd. https://doi.org/10.1002/9781119161110.ch16
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging. Boosting and variants. Machine Learning. 36(1), 105 139. https://doi.org/10.1023/A:1007515423169
Biau, G. (2012). Analysis of a Random Forest. Journal of Machine Learning Research, 13(2012), 1063-1095. https://www.jmlr.org/papers/volume13/biau12a/biau12a.pdf
Biau, G., & Scornet, E., (2016). A random forest guided tour. An Official Journal of the Spanish Society of Statistics and Operations Research, 25(2), 197 227. https://doi.org/10.1007/s11749-016-0481-7
Breiman, L. (1996). Bagging predictors. Machine Learning 24(2), 123 140. https://doi.org/10.1007/BF00058655
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5 32. https://doi.org/10.1023/A:1010933404324
Chen, T., & Guestrin, C. (2016, 13, August). Xgboost: A scalable tree boosting system [Conference presentation]. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA. http://dx.doi.org/10.1145/2939672.2939785
Clarke, B., Fokoue, E., & Zhang, H.H. (2009). Principles and theory for data mining and machine learning. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-98135-2
Çokluk, Ö., Şekercioğlu, G., & Büyüköztürk, Ş. (2012). Multivariate statistics for social sciences: SPSS and LISREL applications (2th edition). Pegem Academy.
Do-Nascimento, R.L., Fagundes, R.A., & Maciel, A.M. (2019, 15-18, July). Prediction of School Efficiency Rates through Ensemble Regression Application [Conference presentation]. In 2019 IEEE 19th International Conference on Advanced Learning Technologies, Maceio, Brazil. https://doi.org/10.1109/ICALT.2019.00050
Dietterich, T.G. (2000a). Ensemble Methods in Machine Learning. In: Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, 1857, 1-15. https://doi.org/10.1007/3-540-45014-9_1
Dietterich, T.G. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139-157. https://doi.org/10.1023/A:1007607513941
Dietterich, T.G. (2002). Ensemble learning. The Handbook of Brain Theory and Neural Networks, 2(1), 110-125. https://courses.cs.washington.edu/courses/cse446/12wi/tgd-ensembles.pdf
Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall/CRC.
Elish, M.O., & Elish, K.O. (2009, 24-27, March). Application of treenet in predicting object-oriented software maintainability: A comparative study. In 2009 13th European Conference on Software Maintenance and Reengineering, Kaiserslautern, Germany. https://doi.org/10.1109/CSMR.2009.57
Ferreira, A.J., & Figueiredo, M.A. (2012). Boosting algorithms: A review of methods, theory, and applications. Ensemble machine learning (1th edition, 35-85). Springer. https://doi.org/10.1007/978-1-4419-9326-7_2
Freund, Y., & Schapire, R.E. (1996, 3-6, July). Experiments with a new boosting algorithm [Conference presentation]. Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Bari Italy.
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Annals of Statistics, 28(2), 337-407. https://doi.org/10.1214/aos/1016218223
Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5) 1189-1232. https://www.jstor.org/stable/2699986
Friedman, J.H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2
Friedman, J.H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9), 1365-1381. https://doi.org/10.1002/sim.1501
Geneur, R., Poggi, J.M., Tuleao Malot, C., & Villa-Vialaneix, N. (2017). Random forest for big data. Big Data Research, 9, 28-46. https://doi.org/10.1016/j.bdr.2017.07.003
Geron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems (1th edition). O'Reilly Media.
Guo, J., Bai, L., Yu, Z., Zhao, Z., & Wan, B. (2021). An AI-application-oriented in-class teaching evaluation model by using statistical modeling and ensemble learning. Sensors, 21(1), 241. https://doi.org/10.3390/s21010241
Han, J., Kamber, M., & Pei, J., (2012). Data mining: concepts and techniques (3th edition). Elsevier.
Hansen, L.K., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993-1001. https://doi.org/10.1109/34.58871
Hastie, T., Tibshirani, R. & Friedman, J.H. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer. https://doi.org/10.1007/978-0-387-21606-5
Hill, T., & Lewicki, P. (2006). Statistics: methods and applications: a comprehensive reference for science, industry, and data mining (1th edition). StatSoft, Inc.
Ho, T.K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844. https://doi.org/10.1109/34.709601
Huffer, F.W., & Park, C. (2020). A Simple Rule for Monitoring the Error Rate of Random Forest for Classification. Quantitative Bio-Science, 39(1), 1-15.
Injadat, M., Moubayed, A., Nassif, A.B., & Shami, A. (2020a). Systematic ensemble model selection approach for educational data mining. Knowledge-Based Systems, 200, 105992. https://doi.org/10.1016/j.knosys.2020.105992
Injadat, M., Moubayed, A., Nassif, A.B., & Shami, A. (2020b). Multi-split optimized bagging ensemble model selection for multi-class educational data mining. Applied Intelligence, 50(12), 4506-4528. https://doi.org/10.1007/s10489-020-01776-3
Kapucu, C., & Cubukcu, M. (2021). A supervised ensemble learning method for fault diagnosis in photovoltaic strings. Energy, 227, 1-12. https://doi.org/10.1016/j.energy.2021.120463
Karalar, H., Kapucu, C., & Gürüler, H. (2021). Predicting students at risk of academic failure using ensemble model during pandemic in a distance learning system. International Journal of Educational Technology in Higher Education, 18(1), 1-18. https://doi.org/10.1186/s41239-021-00300-y
Kausar, S., Oyelere, S., Salal, Y., Hussain, S., Cifci, M., Hilcenko, S., ... & Huahu, X. (2020). Mining smart learning analytics data using ensemble classifiers. International Journal of Emerging Technologies in Learning, 15(12), 81 102. https://www.learntechlib.org/p/217561/
Keser, S.B., & Aghalarova, S. (2022). HELA: A novel hybrid ensemble learning algorithm for predicting academic performance of students. Education and Information Technologies, 27(4), 4521-4552. https://doi.org/10.1007/s10639-021-10780-0
Kotsiantis, S., Patriarcheas, K., & Xenos, M. (2010). A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowledge Based Systems, 23(6), 529 535. https://doi.org/10.1016/j.knosys.2010.03.010
Kumari, G.T. (2012). A Study of Bagging and Boosting approaches to develop meta-classifier. Engineering Science and Technology: An International Journal, 2(5), 850-855.
Leedy, P.D., & Ormrod, J.E. (2005). Practical research (Vol. 108). Saddle River.
Lee, S.L.A., Kouzani, A.Z., & Hu, E. J. (2010). Random forest based lung nodule classification aided by clustering. Computerized Medical Imaging and Graphics, 34(7), 535-542. https://doi.org/10.1016/j.compmedimag.2010.03.006
Li, B., Yu, Q., & Peng, L. (2022). Ensemble of fast learning stochastic gradient boosting. Communications in Statistics-Simulation and Computation, 51(1), 40-52. https://doi.org/10.1080/03610918.2019.1645170
Machová, K., Puszta, M., Barčák, F., & Bednár, P. (2006). A comparison of the bagging and the boosting methods using the decision trees classifiers. Computer Science and Information Systems, 3(2), 57-72. https://doi.org/10.2298/CSIS0602057M
Maclin, R., & Opitz, D. (1997, 27-31, July). An empirical evaluation of bagging and boosting [Conference presentation]. AAAI-97: Fourteenth National Conference on Artificial Intelligence, Rhode Island.
Märker, M., Pelacani, S., & Schröder, B. (2011). A functional entity approach to predict soil erosion processes in a small Plio-Pleistocene Mediterranean catchment in Northern Chianti, Italy. Geomorphology, 125(4), 530 540. https://doi.org/10.1016/j.geomorph.2010.10.022
Mi, C., Huettmann, F., Guo, Y., Han, X., & Wen, L. (2017). Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. Peer J, 5, e2849.
Mousavi, R., & Eftekhari, M. (2015). A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches. Applied Soft Computing, 37, 652-666. https://doi.org/10.1016/j.asoc.2015.09.009
Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications (1th edition). Academic Press.
Olson, D.L., & Delen, D. (2008). Advanced data mining techniques. Springer Science & Business Media.
Onan, A. (2015). On the performance of ensemble learning for automated diagnosis of breast cancer. R. Silhavy R. Senkerik, Z. K. Oplatkova, Z. Prokopova, & P. Silhavy (eds.), In Artificial Intelligence Perspectives and Applications: Proceedings of the 4th Computer Science On-line Conference, Vol 1 (pp. 119-129). Springer International Publishing.. https://doi.org/10.1007/978-3-319-18476-0_13
Opitz, D.W., & Shavlik, J.W. (1996). Generating accurate and diverse members of a neural network ensemble. Advances in Neural Information Processing Systems, 8, 535-541.
Padmaja, B., Prasad, V.R., & Sunitha, K.V.N. (2016). TreeNet analysis of human stress behavior using socio mobile data. Journal of Big Data, 3(1), 1 15. https://doi.org/10.1186/s40537-016-0054-3
Padmaja, B., Srinidhi, C., Sindhu, K., Vanaja, K., Deepika, N.M., & Patro, E.K.R. (2021). Early and accurate prediction of heart disease using machine learning model. Turkish Journal of Computer and Mathematics Education, 12(6), 4516-4528.
Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3). 21-45. https://doi.org/10.1109/MCAS.2006.1688199
Polikar, R. (2012). Ensemble learning. In Ensemble machine learning (1th edition pp. 1-34). Springer. https://doi.org/10.1007/978-1-4419-9326-7_1
Premalatha, N., & Sujatha, S. (2021, 15-17, September). An Effective Ensemble Model to Predict Employment Status of Graduates in Higher Educational Institutions [Conference presentation]. In 2021 Fourth International Conference on Electrical, Computer and Communication Technologies Erode, India. https://doi.org/10.1109/icecct52121.2021.9616952
Probst, P., & Boulesteix, A.L. (2017). To tune or not to tune the number of trees in random forest. The Journal of Machine Learning Research, 18(1), 6673-6690. http://jmlr.org/papers/v18/17-269.html
Rokach, L. (2019). Ensemble learning: Pattern classification using ensemble methods (2th edition). World Scientific. https://doi.org/10.1142/9789811201967_0003
Pong-Inwong, C., & Kaewmak, K. (2016, 14-17, October). Improved sentiment analysis for teaching evaluation using feature selection and voting ensemble learning integration [Conference presentation]. In 2016 2nd IEEE international conference on computer and communications, Chengdu, China. https://doi.org/10.1109/CompComm.2016.7924899
Quinlan, J.R. (1996, 4-8, August). Bagging, boosting, and C4. 5 [Conference presentation]. In 13th National Conference on Artificial Intelligence, Portland, Oregon, USA.
Saeys, Y., Abeel, T., & Peer, Y.V.D. (2008). Robust feature selection using ensemble feature selection techniques. W. Daelemans, B. Goethals & K. Morik (Eds.), Machine learning and knowledge discovery in databases (pp 313 325) Springer. https://doi.org/10.1007/978-3-540-87481-2_21
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 8(4). e1249. https://doi.org/10.1002/widm.1249
Schapire, R.E. (2003). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification, 149-171. https://doi.org/10.1007/978-0-387-21579-2_9
Schroeder, M.A., Lander, J., & Levine-Silverman, S. (1990). Diagnosing and dealing with multicollinearity. Western Journal of Nursing Research, 12(2), 175-187. https://doi.org/10.1177/019394599001200204
Sinharay, S. (2016). An NCME instructional module on data mining methods for classification and regression. Educational Measurement: Issues and Practice, 35(3), 38-54. https://doi.org/10.1111/emip.12115
Skurichina, M., & Duin, R.P. (2002). Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications, 5(2), 121 135. https://doi.org/10.1007/s100440200011
Steinki, O., & Mohammad, Z. (2015). Introduction to ensemble learning. Available at SSRN, 1(1), 1-9. http://dx.doi.org/10.2139/ssrn.2634092
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323. https://doi.org/10.1037/a0016973
Subasi, A., El-Amin, M.F., Darwich, T., & Dossary, M. (2022). Permeability prediction of petroleum reservoirs using stochastic gradient boosting regression. Journal of Ambient Intelligence and Humanized Computing, 13, 3555-3564. https://doi.org/10.1007/s12652-020-01986-0
Sutton, C.D. (2005). Classification and regression trees, bagging, and boosting. Handbook of Statistics, 24, 303-329. https://doi.org/10.1016/S0169-7161(04)24011-1
Şevgin, H. (2020). Predicting the ABIDE 2016 science achievement: The comparison of MARS and BRT data mining methods [Unpublished Doctoral Thesis]. Gazi University.
Şevgin, H., & Önen, E. (2022). Comparison of Classification Performances of MARS and BRT Data Mining Methods: ABİDE-2016 Case. Education and Science, 47(211). http://dx.doi.org/10.15390/EB.2022.10575
Tabachnick, B.G., & Fidell, L.S. (2015). Using multivariate statistics (6th edition). (M. Baloğlu, Trans.). Nobel Publications. (Original work published 2012).
Ting, K. M. (2017). Confusion matrix. In C. Sammut & G. I. Webb (Eds.) Encyclopedia of Machine Learning and Data Mining (pp. 260–260). Springer.
Tuğ Karoğlu, T.T., & Okut, H., (2020). Classification of the placement success in the undergraduate placement examination according to decision trees with bagging and boosting methods. Cumhuriyet Science Journal, 41(1), 93 105. https://doi.org/10.17776/csj.544639
Wang, Z., Wang, Y., & Srinivasan, R.S. (2018). A novel ensemble learning approach to support building energy use prediction. Energy and Buildings, 159, 109 122. https://doi.org/10.1016/j.enbuild.2017.10.085
Yurdugül, H. (2006). The comparison of reliability coefficients in parallel, tau-equivalent, and congeneric measurements. Ankara University Journal of Faculty of Educational Sciences, 39(1), 15-37. https://doi.org/10.1501/Egifak_0000000127
Zhang, C., & Ma, Y. (2012). Ensemble machine learning: methods and applications. Springer. https://doi.org/10.1007/978-1-4419-9326-7
Zhou Z.H. (2012). Ensemble methods: foundations and algorithms. Chapman and Hall/CRC.

APA	şevgin h (2023). A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. , 544 - 562. 10.21449/ijate.1167705
Chicago	şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. (2023): 544 - 562. 10.21449/ijate.1167705
MLA	şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. , 2023, ss.544 - 562. 10.21449/ijate.1167705
AMA	şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. . 2023; 544 - 562. 10.21449/ijate.1167705
Vancouver	şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. . 2023; 544 - 562. 10.21449/ijate.1167705
IEEE	şevgin h "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms." , ss.544 - 562, 2023. 10.21449/ijate.1167705
ISNAD	şevgin, hikmet. "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms". (2023), 544-562. https://doi.org/10.21449/ijate.1167705

APA	şevgin h (2023). A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education, 10(3), 544 - 562. 10.21449/ijate.1167705
Chicago	şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education 10, no.3 (2023): 544 - 562. 10.21449/ijate.1167705
MLA	şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education, vol.10, no.3, 2023, ss.544 - 562. 10.21449/ijate.1167705
AMA	şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education. 2023; 10(3): 544 - 562. 10.21449/ijate.1167705
Vancouver	şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education. 2023; 10(3): 544 - 562. 10.21449/ijate.1167705
IEEE	şevgin h "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms." International Journal of Assessment Tools in Education, 10, ss.544 - 562, 2023. 10.21449/ijate.1167705
ISNAD	şevgin, hikmet. "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms". International Journal of Assessment Tools in Education 10/3 (2023), 544-562. https://doi.org/10.21449/ijate.1167705