Yıl: 2023 Cilt: 10 Sayı: 3 Sayfa Aralığı: 544 - 562 Metin Dili: İngilizce DOI: 10.21449/ijate.1167705 İndeks Tarihi: 29-09-2023

A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms

Öz:
This study aims to conduct a comparative study of Bagging and Boosting algorithms among ensemble methods and to compare the classification performance of TreeNet and Random Forest methods using these algorithms on the data extracted from ABİDE application in education. The main factor in choosing them for analyses is that they are Ensemble methods combining decision trees via Bagging and Boosting algorithms and creating a single outcome by combining the outputs obtained from each of them. The data set consists of mathematics scores of ABİDE (Academic Skills Monitoring and Evaluation) 2016 implementation and various demographic variables regarding students. The study group involves 5000 students randomly recruited. On the deletion of loss data and assignment procedures, this number decreased to 4568. The analyses showed that the TreeNet method performed more successfully in terms of classification accuracy, sensitivity, F1-score and AUC value based on sample size, and the Random Forest method on specificity and accuracy. It can be alleged that the TreeNet method is more successful in all numerical estimation error rates for each sample size by producing lower values compared to the Random Forest method. When comparing both analysis methods based on ABİDE data, considering all the conditions, including sample size, cross validity and performance criteria following the analyses, TreeNet can be said to exhibit higher classification performance than Random Forest. Unlike a single classifier or predictive method, the classification or prediction of multiple methods by using Boosting and Bagging algorithms is considered important for the results obtained in education.
Anahtar Kelime: Educational Data Mining Ensemble Learning Bagging Boosting

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Abdar, M., Zomorodi-Moghadam, M., & Zhou, X. (2018, 12-14, November). An ensemble-based decision tree approach for educational data mining [Conference presentation]. In 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC), Kaohsiung, Taiwan. https://doi.org/10.1109/BESC.2018.8697318
  • Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., & Saeys, Y. (2010). Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 26(3). 392-398. https://doi.org/10.1093/bioinformatics/btp630
  • Abidi, S.M.R., Zhang, W., Haidery, S.A., Rizvi, S.S., Riaz, R., Ding, H., & Kwon, S.J. (2020). Educational sustainability through big data assimilation to quantify academic procrastination using ensemble classifiers. Sustainability, 12(15), 6074. https://doi.org/10.3390/su12156074
  • Aggarwal, D., Mittal, S., & Bali, V. (2021). Significance of non-academic parameters for predicting student performance using ensemble learning techniques. International Journal of System Dynamics Applications, 10(3), 38 49. https://doi.org/10.4018/IJSDA.2021070103
  • Akman, M. (2010). An overview of data mining techniques and analysis of Random Forests method: An application on medical field [Unpublished master’s thesis]. Ankara University.
  • Almasri, A., Celebi, E., & Alkhawaldeh, R.S. (2019). EMT: Ensemble meta-based tree model for predicting student performance. Hindawi, 1 13. https://doi.org/10.1155/2019/3610248
  • Amrieh, E.A., Hamtini, T., & Aljarah, I. (2016). Mining educational data to predict student’s academic performance using ensemble methods. International Journal of Database Theory and Application, 9(8), 119-136. http://dx.doi.org/10.14257/ijdta.2016.9.8.13
  • Ashraf, M., Zaman, M., & Ahmed, M. (2020). An intelligent prediction system for educational data mining based on ensemble and filtering approaches. Procedia Computer Science, 167, 1471-1483. https://doi.org/10.1016/j.procs.2020.03.358
  • Ashraf, M., Salal, Y.K., & Abdullaev, S.M. (2021). Educational Data Mining Using Base (Individual) and Ensemble Learning Approaches to Predict the Performance of Students. In Data Science. Springer. https://doi.org/10.1007/978-981-16-1681-5_2
  • Arun, D.K., Namratha, V., Ramyashree, B.V., Jain, Y.P., & Choudhury, A.R. (2021, 27-29, January). Student academic performance prediction using educational data mining [Conference presentation]. In 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India. https://doi.org/10.1109/ICCCI50826.2021.9457021
  • Baskin, I.I., Marcou, G., Horvath, D., & Varnek, A. (2017a). Bagging and boosting of classification models. Tutorials in Chemoinformatics, 241 247. John Wiley & Sons Ltd. https://doi.org/10.1002/9781119161110.ch15
  • Baskin, I.I., Marcou, G., Horvath, D., & Varnek, A. (2017b). Bagging and boosting of regression models. Tutorials in Chemoinformatics, 249-255. John Wiley & Sons Ltd. https://doi.org/10.1002/9781119161110.ch16
  • Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging. Boosting and variants. Machine Learning. 36(1), 105 139. https://doi.org/10.1023/A:1007515423169
  • Biau, G. (2012). Analysis of a Random Forest. Journal of Machine Learning Research, 13(2012), 1063-1095. https://www.jmlr.org/papers/volume13/biau12a/biau12a.pdf
  • Biau, G., & Scornet, E., (2016). A random forest guided tour. An Official Journal of the Spanish Society of Statistics and Operations Research, 25(2), 197 227. https://doi.org/10.1007/s11749-016-0481-7
  • Breiman, L. (1996). Bagging predictors. Machine Learning 24(2), 123 140. https://doi.org/10.1007/BF00058655
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5 32. https://doi.org/10.1023/A:1010933404324
  • Chen, T., & Guestrin, C. (2016, 13, August). Xgboost: A scalable tree boosting system [Conference presentation]. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA. http://dx.doi.org/10.1145/2939672.2939785
  • Clarke, B., Fokoue, E., & Zhang, H.H. (2009). Principles and theory for data mining and machine learning. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-98135-2
  • Çokluk, Ö., Şekercioğlu, G., & Büyüköztürk, Ş. (2012). Multivariate statistics for social sciences: SPSS and LISREL applications (2th edition). Pegem Academy.
  • Do-Nascimento, R.L., Fagundes, R.A., & Maciel, A.M. (2019, 15-18, July). Prediction of School Efficiency Rates through Ensemble Regression Application [Conference presentation]. In 2019 IEEE 19th International Conference on Advanced Learning Technologies, Maceio, Brazil. https://doi.org/10.1109/ICALT.2019.00050
  • Dietterich, T.G. (2000a). Ensemble Methods in Machine Learning. In: Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, 1857, 1-15. https://doi.org/10.1007/3-540-45014-9_1
  • Dietterich, T.G. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139-157. https://doi.org/10.1023/A:1007607513941
  • Dietterich, T.G. (2002). Ensemble learning. The Handbook of Brain Theory and Neural Networks, 2(1), 110-125. https://courses.cs.washington.edu/courses/cse446/12wi/tgd-ensembles.pdf
  • Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall/CRC.
  • Elish, M.O., & Elish, K.O. (2009, 24-27, March). Application of treenet in predicting object-oriented software maintainability: A comparative study. In 2009 13th European Conference on Software Maintenance and Reengineering, Kaiserslautern, Germany. https://doi.org/10.1109/CSMR.2009.57
  • Ferreira, A.J., & Figueiredo, M.A. (2012). Boosting algorithms: A review of methods, theory, and applications. Ensemble machine learning (1th edition, 35-85). Springer. https://doi.org/10.1007/978-1-4419-9326-7_2
  • Freund, Y., & Schapire, R.E. (1996, 3-6, July). Experiments with a new boosting algorithm [Conference presentation]. Proceedings of the Thirteenth International Conference on International Conference on Machine Learning, Bari Italy.
  • Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Annals of Statistics, 28(2), 337-407. https://doi.org/10.1214/aos/1016218223
  • Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5) 1189-1232. https://www.jstor.org/stable/2699986
  • Friedman, J.H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2
  • Friedman, J.H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9), 1365-1381. https://doi.org/10.1002/sim.1501
  • Geneur, R., Poggi, J.M., Tuleao Malot, C., & Villa-Vialaneix, N. (2017). Random forest for big data. Big Data Research, 9, 28-46. https://doi.org/10.1016/j.bdr.2017.07.003
  • Geron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems (1th edition). O'Reilly Media.
  • Guo, J., Bai, L., Yu, Z., Zhao, Z., & Wan, B. (2021). An AI-application-oriented in-class teaching evaluation model by using statistical modeling and ensemble learning. Sensors, 21(1), 241. https://doi.org/10.3390/s21010241
  • Han, J., Kamber, M., & Pei, J., (2012). Data mining: concepts and techniques (3th edition). Elsevier.
  • Hansen, L.K., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993-1001. https://doi.org/10.1109/34.58871
  • Hastie, T., Tibshirani, R. & Friedman, J.H. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer. https://doi.org/10.1007/978-0-387-21606-5
  • Hill, T., & Lewicki, P. (2006). Statistics: methods and applications: a comprehensive reference for science, industry, and data mining (1th edition). StatSoft, Inc.
  • Ho, T.K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844. https://doi.org/10.1109/34.709601
  • Huffer, F.W., & Park, C. (2020). A Simple Rule for Monitoring the Error Rate of Random Forest for Classification. Quantitative Bio-Science, 39(1), 1-15.
  • Injadat, M., Moubayed, A., Nassif, A.B., & Shami, A. (2020a). Systematic ensemble model selection approach for educational data mining. Knowledge-Based Systems, 200, 105992. https://doi.org/10.1016/j.knosys.2020.105992
  • Injadat, M., Moubayed, A., Nassif, A.B., & Shami, A. (2020b). Multi-split optimized bagging ensemble model selection for multi-class educational data mining. Applied Intelligence, 50(12), 4506-4528. https://doi.org/10.1007/s10489-020-01776-3
  • Kapucu, C., & Cubukcu, M. (2021). A supervised ensemble learning method for fault diagnosis in photovoltaic strings. Energy, 227, 1-12. https://doi.org/10.1016/j.energy.2021.120463
  • Karalar, H., Kapucu, C., & Gürüler, H. (2021). Predicting students at risk of academic failure using ensemble model during pandemic in a distance learning system. International Journal of Educational Technology in Higher Education, 18(1), 1-18. https://doi.org/10.1186/s41239-021-00300-y
  • Kausar, S., Oyelere, S., Salal, Y., Hussain, S., Cifci, M., Hilcenko, S., ... & Huahu, X. (2020). Mining smart learning analytics data using ensemble classifiers. International Journal of Emerging Technologies in Learning, 15(12), 81 102. https://www.learntechlib.org/p/217561/
  • Keser, S.B., & Aghalarova, S. (2022). HELA: A novel hybrid ensemble learning algorithm for predicting academic performance of students. Education and Information Technologies, 27(4), 4521-4552. https://doi.org/10.1007/s10639-021-10780-0
  • Kotsiantis, S., Patriarcheas, K., & Xenos, M. (2010). A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowledge Based Systems, 23(6), 529 535. https://doi.org/10.1016/j.knosys.2010.03.010
  • Kumari, G.T. (2012). A Study of Bagging and Boosting approaches to develop meta-classifier. Engineering Science and Technology: An International Journal, 2(5), 850-855.
  • Leedy, P.D., & Ormrod, J.E. (2005). Practical research (Vol. 108). Saddle River.
  • Lee, S.L.A., Kouzani, A.Z., & Hu, E. J. (2010). Random forest based lung nodule classification aided by clustering. Computerized Medical Imaging and Graphics, 34(7), 535-542. https://doi.org/10.1016/j.compmedimag.2010.03.006
  • Li, B., Yu, Q., & Peng, L. (2022). Ensemble of fast learning stochastic gradient boosting. Communications in Statistics-Simulation and Computation, 51(1), 40-52. https://doi.org/10.1080/03610918.2019.1645170
  • Machová, K., Puszta, M., Barčák, F., & Bednár, P. (2006). A comparison of the bagging and the boosting methods using the decision trees classifiers. Computer Science and Information Systems, 3(2), 57-72. https://doi.org/10.2298/CSIS0602057M
  • Maclin, R., & Opitz, D. (1997, 27-31, July). An empirical evaluation of bagging and boosting [Conference presentation]. AAAI-97: Fourteenth National Conference on Artificial Intelligence, Rhode Island.
  • Märker, M., Pelacani, S., & Schröder, B. (2011). A functional entity approach to predict soil erosion processes in a small Plio-Pleistocene Mediterranean catchment in Northern Chianti, Italy. Geomorphology, 125(4), 530 540. https://doi.org/10.1016/j.geomorph.2010.10.022
  • Mi, C., Huettmann, F., Guo, Y., Han, X., & Wen, L. (2017). Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. Peer J, 5, e2849.
  • Mousavi, R., & Eftekhari, M. (2015). A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches. Applied Soft Computing, 37, 652-666. https://doi.org/10.1016/j.asoc.2015.09.009
  • Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications (1th edition). Academic Press.
  • Olson, D.L., & Delen, D. (2008). Advanced data mining techniques. Springer Science & Business Media.
  • Onan, A. (2015). On the performance of ensemble learning for automated diagnosis of breast cancer. R. Silhavy R. Senkerik, Z. K. Oplatkova, Z. Prokopova, & P. Silhavy (eds.), In Artificial Intelligence Perspectives and Applications: Proceedings of the 4th Computer Science On-line Conference, Vol 1 (pp. 119-129). Springer International Publishing.. https://doi.org/10.1007/978-3-319-18476-0_13
  • Opitz, D.W., & Shavlik, J.W. (1996). Generating accurate and diverse members of a neural network ensemble. Advances in Neural Information Processing Systems, 8, 535-541.
  • Padmaja, B., Prasad, V.R., & Sunitha, K.V.N. (2016). TreeNet analysis of human stress behavior using socio mobile data. Journal of Big Data, 3(1), 1 15. https://doi.org/10.1186/s40537-016-0054-3
  • Padmaja, B., Srinidhi, C., Sindhu, K., Vanaja, K., Deepika, N.M., & Patro, E.K.R. (2021). Early and accurate prediction of heart disease using machine learning model. Turkish Journal of Computer and Mathematics Education, 12(6), 4516-4528.
  • Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3). 21-45. https://doi.org/10.1109/MCAS.2006.1688199
  • Polikar, R. (2012). Ensemble learning. In Ensemble machine learning (1th edition pp. 1-34). Springer. https://doi.org/10.1007/978-1-4419-9326-7_1
  • Premalatha, N., & Sujatha, S. (2021, 15-17, September). An Effective Ensemble Model to Predict Employment Status of Graduates in Higher Educational Institutions [Conference presentation]. In 2021 Fourth International Conference on Electrical, Computer and Communication Technologies Erode, India. https://doi.org/10.1109/icecct52121.2021.9616952
  • Probst, P., & Boulesteix, A.L. (2017). To tune or not to tune the number of trees in random forest. The Journal of Machine Learning Research, 18(1), 6673-6690. http://jmlr.org/papers/v18/17-269.html
  • Rokach, L. (2019). Ensemble learning: Pattern classification using ensemble methods (2th edition). World Scientific. https://doi.org/10.1142/9789811201967_0003
  • Pong-Inwong, C., & Kaewmak, K. (2016, 14-17, October). Improved sentiment analysis for teaching evaluation using feature selection and voting ensemble learning integration [Conference presentation]. In 2016 2nd IEEE international conference on computer and communications, Chengdu, China. https://doi.org/10.1109/CompComm.2016.7924899
  • Quinlan, J.R. (1996, 4-8, August). Bagging, boosting, and C4. 5 [Conference presentation]. In 13th National Conference on Artificial Intelligence, Portland, Oregon, USA.
  • Saeys, Y., Abeel, T., & Peer, Y.V.D. (2008). Robust feature selection using ensemble feature selection techniques. W. Daelemans, B. Goethals & K. Morik (Eds.), Machine learning and knowledge discovery in databases (pp 313 325) Springer. https://doi.org/10.1007/978-3-540-87481-2_21
  • Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 8(4). e1249. https://doi.org/10.1002/widm.1249
  • Schapire, R.E. (2003). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification, 149-171. https://doi.org/10.1007/978-0-387-21579-2_9
  • Schroeder, M.A., Lander, J., & Levine-Silverman, S. (1990). Diagnosing and dealing with multicollinearity. Western Journal of Nursing Research, 12(2), 175-187. https://doi.org/10.1177/019394599001200204
  • Sinharay, S. (2016). An NCME instructional module on data mining methods for classification and regression. Educational Measurement: Issues and Practice, 35(3), 38-54. https://doi.org/10.1111/emip.12115
  • Skurichina, M., & Duin, R.P. (2002). Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications, 5(2), 121 135. https://doi.org/10.1007/s100440200011
  • Steinki, O., & Mohammad, Z. (2015). Introduction to ensemble learning. Available at SSRN, 1(1), 1-9. http://dx.doi.org/10.2139/ssrn.2634092
  • Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323. https://doi.org/10.1037/a0016973
  • Subasi, A., El-Amin, M.F., Darwich, T., & Dossary, M. (2022). Permeability prediction of petroleum reservoirs using stochastic gradient boosting regression. Journal of Ambient Intelligence and Humanized Computing, 13, 3555-3564. https://doi.org/10.1007/s12652-020-01986-0
  • Sutton, C.D. (2005). Classification and regression trees, bagging, and boosting. Handbook of Statistics, 24, 303-329. https://doi.org/10.1016/S0169-7161(04)24011-1
  • Şevgin, H. (2020). Predicting the ABIDE 2016 science achievement: The comparison of MARS and BRT data mining methods [Unpublished Doctoral Thesis]. Gazi University.
  • Şevgin, H., & Önen, E. (2022). Comparison of Classification Performances of MARS and BRT Data Mining Methods: ABİDE-2016 Case. Education and Science, 47(211). http://dx.doi.org/10.15390/EB.2022.10575
  • Tabachnick, B.G., & Fidell, L.S. (2015). Using multivariate statistics (6th edition). (M. Baloğlu, Trans.). Nobel Publications. (Original work published 2012).
  • Ting, K. M. (2017). Confusion matrix. In C. Sammut & G. I. Webb (Eds.) Encyclopedia of Machine Learning and Data Mining (pp. 260–260). Springer.
  • Tuğ Karoğlu, T.T., & Okut, H., (2020). Classification of the placement success in the undergraduate placement examination according to decision trees with bagging and boosting methods. Cumhuriyet Science Journal, 41(1), 93 105. https://doi.org/10.17776/csj.544639
  • Wang, Z., Wang, Y., & Srinivasan, R.S. (2018). A novel ensemble learning approach to support building energy use prediction. Energy and Buildings, 159, 109 122. https://doi.org/10.1016/j.enbuild.2017.10.085
  • Yurdugül, H. (2006). The comparison of reliability coefficients in parallel, tau-equivalent, and congeneric measurements. Ankara University Journal of Faculty of Educational Sciences, 39(1), 15-37. https://doi.org/10.1501/Egifak_0000000127
  • Zhang, C., & Ma, Y. (2012). Ensemble machine learning: methods and applications. Springer. https://doi.org/10.1007/978-1-4419-9326-7
  • Zhou Z.H. (2012). Ensemble methods: foundations and algorithms. Chapman and Hall/CRC.
APA şevgin h (2023). A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. , 544 - 562. 10.21449/ijate.1167705
Chicago şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. (2023): 544 - 562. 10.21449/ijate.1167705
MLA şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. , 2023, ss.544 - 562. 10.21449/ijate.1167705
AMA şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. . 2023; 544 - 562. 10.21449/ijate.1167705
Vancouver şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. . 2023; 544 - 562. 10.21449/ijate.1167705
IEEE şevgin h "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms." , ss.544 - 562, 2023. 10.21449/ijate.1167705
ISNAD şevgin, hikmet. "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms". (2023), 544-562. https://doi.org/10.21449/ijate.1167705
APA şevgin h (2023). A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education, 10(3), 544 - 562. 10.21449/ijate.1167705
Chicago şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education 10, no.3 (2023): 544 - 562. 10.21449/ijate.1167705
MLA şevgin hikmet A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education, vol.10, no.3, 2023, ss.544 - 562. 10.21449/ijate.1167705
AMA şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education. 2023; 10(3): 544 - 562. 10.21449/ijate.1167705
Vancouver şevgin h A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms. International Journal of Assessment Tools in Education. 2023; 10(3): 544 - 562. 10.21449/ijate.1167705
IEEE şevgin h "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms." International Journal of Assessment Tools in Education, 10, ss.544 - 562, 2023. 10.21449/ijate.1167705
ISNAD şevgin, hikmet. "A comparative study of ensemble methods in the field of education: Bagging and Boosting algorithms". International Journal of Assessment Tools in Education 10/3 (2023), 544-562. https://doi.org/10.21449/ijate.1167705