Yıl: 2022 Cilt: 11 Sayı: 2 Sayfa Aralığı: 924 - 933 Metin Dili: İngilizce DOI: 10.5455/medscience.2021.10.339 İndeks Tarihi: 04-07-2022

An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data

Öz:
This study aims to classify NSCLC death status and consists of patient records of 24 variables created by the open-source dataset of the cancer data site. Besides, basic classifiers such as SMO (Sequential Minimal Optimization), K-NN (K-Nearest Neighbor), random forest, and XGBoost (Extreme Gradient Boosting), which are machine learning methods, and their performances, and voting, bagging, boosting, and stacking methods from ensemble learning methods were used. Performance evaluation of models was compared in terms of accuracy, specificity, sensitivity, precision, and Roc curve. The basic classifier performances of random forest, SMO, K-NN, and XGBoost classifiers, their performances in the bagging ensemble learning method, and their performances in the boosting ensemble learning method are evaluated. In addition, Model 1 (random forest + SMO), Model 2 (XGBoost + K-NN), Model 3 (random forest + K-NN), Model 4 (XGBoost+SMO), Model 5 (SMO+K-NN + random forest), Model 6 (SMO+K-NN+XGBoost) and Model 7 (SMO+K-NN + random forest + XGBoost) the performances of in different metrics were expressed. The boosting ensemble learning method, which provides the maximum classification performance with XGBoost, achieved a 0.982 accuracy value, 0.971 sensitivity value, 0.989 precision value, 0.989 specificity value, and 0.998 ROC curve. It is recommended to use ensemble learning methods for classification problems in patients with a high prevalence of cancer to achieve successful results.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Derleme Erişim Türü: Erişime Açık
  • 1. Gunbatar H, Sertogullarindan B, et al. Evaluation of cases with lung cancer; 3-year analysis. Van Med J. 2012;13-20.
  • 2. TT. Association and Annual Congress. Lung and pleural malignancies working group, Turkey's lung cancer map project. Turkey's Lung Cancer Incidence. 2005;13.
  • 3. Yilmaz A, Damadoglu E, Salturk C, et al. Delays in the diagnosis and treatment of primary lung cancer: are longer delays associated with the advanced pathological stage? Ups J Med Sci. 2008;113.3:287-96.
  • 4. Schreiber G, McCrory DC. Performance characteristics of different modalities for diagnosis of suspected lung cancer: summary of published evidence. Chest. 2003;123;115-28.
  • 5. Yoh W.M. TNM classification for lung cancer. Ann Thorac Cardiovasc Surg. 2003;9:343-50. 6. Carvalho S, Troost EG, Bons J, et al. Prognostic value of blood-biomarkers related to hypoxia, inflammation, immune response, and tumor load in nonsmall cell lung cancer–A survival model with external validation. Radiother Oncol. 2016;119:487-94.
  • 7. Akman M, Genc Y, Ankarali H. Random Forests Methods and an Application in Health Science. Turkiye Klinikleri J Biostat. 2011;36-48.
  • 8. Rao S, Gupta P. Implementing Improved Algorithm Over Apriori Data Mining Association Rule Algorithm. Int J Comput Sci Inf Technol. 2012;3:489-93.
  • 9. Alatas B. Fuzzy Logic and Genetic Algorithm Approach to the Discovery of Quantitative Association Rules. FiUniv Unv. Graduate School of Natural and Applied Sciences. 2003; Elazig.
  • 10. Karabatak M, Ince MC. Student Success Analysis with Apriori Algorithm. Elektrik Elektronik Bilgisayar Mühendisliği Sempozyumu (ELECO), Bursa. 2004.
  • 11. Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers,, 1999;10.3:61-74.
  • 12. Manimaran J, Velmurugan T. Analysing the quality of association rules by computing an interestingness measure. Indian J Sci Technol. 2015;1-12.
  • 13. Kumar S, Joshi N. Rule power factor: a new interest measure in associative classification. Procedia Comput Sci. 2016;12-8.
  • 14. Yildiz O. Melanoma detection from dermoscopy images with deep learning methods: A comprehensive study. J Fac Eng Archit Gazi Univ. 2019;2241- 60.
  • 15. Chen MS, Han J, Yu PS. Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng. 1996;866-83.
  • 16. Berzal F, Blanco I, Sánchez D, Vila M.A. Measuring the accuracy and interest of association rules: A new framework. Intell Data Anal. 2002;221- 35.
  • 17. Polikar R. Ensemble learning. in Ensemble machine learning: Springer. 2012;1-34.
  • 18. Breiman L. Random forests. Mach Learn. 2001;45:5-32.
  • 19. Zhou ZH. Ensemble methods: foundations and algorithms. CRC Press. 2012.
  • 20. Wolpert DH. Stacked generalization. Neural Networks. 1992;241-59.
  • 21. Alpar R. Applied Statistics and Validity-Reliability with Examples from Sports, Health and Education Sciences. 2016;513-57.
  • 22. Yasar S, Arslan A, Colak C, Yologlu S. A Developed Interactive Web Application for Statistical Analysis: Statistical Analysis Software. MBSJ Health Sci. 2020;227-39.
  • 23. Campbell M. RStudio Projects. in Learn RStudio IDE: Springer. 2019;39- 48.
  • 24. Hofmann M, Klinkenberg R. RapidMiner: Data mining use cases and business analytics applications. CRC Press. 2016.
  • 25. Yadav S, Shukla S. Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. in 2016 IEEE 6th IACC. 2016;78-83.
  • 26. Jemal A, Thomas A, Murray T, Thun M. Cancer statistics. Ca-Cancer J Clin. 2002;52:23-47.
  • 27. Cokkinides V, Albano J, Samuels A, et al. American cancer society: Cancer facts and figures. Atlanta: American Cancer Society. 2005.
  • 28. Capewell S. Patients presenting with lung cancer in South East Scotland. Thorax. 1987;42:853-7.
  • 29. Kourou K, Exarchos TP, et al. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015;13:8-17.
  • 30. Asri H, Mousannif H, Al Moatassime H, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput Sci. 2016;83:1064-9.
  • 31. Karacan H, Yesilbudak M. User-Centered Interactive Data Mining: A Literature Review. J Information Technologies2010;3.
  • 32. Li K, Liu Z, Han Y. Study of selective ensemble learning methods based on support vector machine. Phys Procedia. 2012;33:1518-25.
  • 33. Nishio M, Nishizawa M, Sugiyama O, et al. Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization. PloS One. 2018;13:e0195875.
  • 34. Faisal MI, Bashir S, Khan ZS, Khan FH. An evaluation of machine learning classifiers and ensembles for early-stage prediction of lung cancer. Paper presented at 2018 3rd International Conference on Emerging Trends in Engineering, Sciences and Technology (ICEEST)] IEEE. 2018.
APA KIVRAK M, ÇOLAK C (2022). An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. , 924 - 933. 10.5455/medscience.2021.10.339
Chicago KIVRAK MEHMET,ÇOLAK Cemil An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. (2022): 924 - 933. 10.5455/medscience.2021.10.339
MLA KIVRAK MEHMET,ÇOLAK Cemil An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. , 2022, ss.924 - 933. 10.5455/medscience.2021.10.339
AMA KIVRAK M,ÇOLAK C An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. . 2022; 924 - 933. 10.5455/medscience.2021.10.339
Vancouver KIVRAK M,ÇOLAK C An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. . 2022; 924 - 933. 10.5455/medscience.2021.10.339
IEEE KIVRAK M,ÇOLAK C "An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data." , ss.924 - 933, 2022. 10.5455/medscience.2021.10.339
ISNAD KIVRAK, MEHMET - ÇOLAK, Cemil. "An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data". (2022), 924-933. https://doi.org/10.5455/medscience.2021.10.339
APA KIVRAK M, ÇOLAK C (2022). An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. Medicine Science, 11(2), 924 - 933. 10.5455/medscience.2021.10.339
Chicago KIVRAK MEHMET,ÇOLAK Cemil An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. Medicine Science 11, no.2 (2022): 924 - 933. 10.5455/medscience.2021.10.339
MLA KIVRAK MEHMET,ÇOLAK Cemil An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. Medicine Science, vol.11, no.2, 2022, ss.924 - 933. 10.5455/medscience.2021.10.339
AMA KIVRAK M,ÇOLAK C An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. Medicine Science. 2022; 11(2): 924 - 933. 10.5455/medscience.2021.10.339
Vancouver KIVRAK M,ÇOLAK C An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data. Medicine Science. 2022; 11(2): 924 - 933. 10.5455/medscience.2021.10.339
IEEE KIVRAK M,ÇOLAK C "An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data." Medicine Science, 11, ss.924 - 933, 2022. 10.5455/medscience.2021.10.339
ISNAD KIVRAK, MEHMET - ÇOLAK, Cemil. "An investigation of ensemble learning methods in classification problems and an application on non-small-cell lung cancer data". Medicine Science 11/2 (2022), 924-933. https://doi.org/10.5455/medscience.2021.10.339