Yıl: 2022 Cilt: 10 Sayı: 2 Sayfa Aralığı: 110 - 117 Metin Dili: İngilizce DOI: 10.17694/bajece.973129 İndeks Tarihi: 12-09-2022

Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS

Öz:
A new hybrid machine learning method for the prediction of type 2 diabetes is introduced and explained in detail. Also, outcomes are compared with similar researches. Early prediction of diabetes is crucial to take necessary measures (i.e. changing eating habits, patient weight control etc.), to defer the emergence of diabetes and to reduce the death rate to some extent and ease medical care professionals’ decision-making in preventing and managing diabetes mellitus. The purpose of this study is the creation of a new hybrid feature selection approach combination of Correlation Matrix with Heatmap and Sequential forward selection (SFS) to reveal the most effective features in the detection of diabetes. A diabetes data set with 520 instances and seven features were studied with the application of the proposed hybrid feature selection approach. The evaluation of the selected optimal features was measured by applying Support Vector Machines(SVM), Random Forest(RF), and Artificial Neural Networks(ANN) classifiers. Five evaluation metrics, namely, Accuracy, F-measure, Precision, Recall, and AUC showed the best performance with ANN (99.1%), F-measure (99.1%), Precision (99.3%), Recall (99.1%), and AUC (99.2%). Our proposed hybrid feature selection model provided a more promising performance with ANN compared to other machine learning algorithms.
Anahtar Kelime: Artificial Neural Network Correlation Matrix Sequential Forward Selection Diabetes Mellitus Hybrid Feature Selection

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Stephanie Watson, “Everything You Need to Know About Diabetes,” 2020. [Online]. Available: https://www.healthline.com/health/diabetes
  • [2] K. Shailaja, B. Seetharamulu, and M. A. Jabbar, “Machine learning in healthcare: A review,” in 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 910–914.
  • [3] N. Peiffer-Smadja, T. Rawson, R. Ahmad, A. Buchard, G. Pantelis, F.- X. Lescure, G. Birgand, and A. Holmes, “Machine learning for clinical decision support in infectious diseases: A narrative review of current applications,” Clinical Microbiology and Infection, vol. 26, 09 2019.
  • [4] E. Sevinc, “A novel evolutionary algorithm for data classification prob- lem with extreme learning machines,” IEEE Access, vol. 7, pp. 122 419– 122 427, 2019.
  • [5] K. D. Silva, W. K. Lee, A. Forbes, R. T. Demmer, C. Barton, and J. Enticott, “Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis,” International Journal of Medical Informatics, vol. 143, no. August, p. 104268, 2020. [Online]. Available: https://doi.org/10.1016/j.ijmedinf.2020.104268
  • [6] H. Zheng, H. W. Park, D. Li, K. H. Park, K. H. Ryu, J. Xue, F. Min, F. Ma, N. P. Tigga, S. Garg, Stephanie Watson, I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, I. Chouvarda, D. Jash- wanth Reddy, B. Mounika, S. Sindhu, T. Pranayteja Reddy, N. Sagar Reddy, G. Jyothsna Sri, K. Swaraja, K. Meenakshi, P. Kora, M. M. F. Islam, R. Ferdousi, S. Rahman, H. Y. Bushra, S. Gupta, A. Guha, D. Jain, V. Singh, A. K. Farahat, A. Ghodsi, M. S. Kamel, J. Chaki, S. Thillai Ganesh, S. K. Cidham, S. Ananda Theertan, V. Bol ́on- Canedo, N. S ́anchez-Maro ̃no, A. Alonso-Betanzos, K. Alpan, G. S. Ilgi, , K. Akyol, and B. S ̧ en, “Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques,” Knowledge and Information Systems, vol. 15, no. 3, pp. 113–125, 2020.
  • [7] J. Chaki, S. Thillai Ganesh, S. K. Cidham, and S. Ananda Theertan, “Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review,” Journal of King Saud University - Computer and Information Sciences, 2020. [Online]. Available: https://doi.org/10.1016/j.jksuci.2020.06.013
  • [8] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Computational and Structural Biotechnology Journal, vol. 15, pp. 104–116, 2017. [Online]. Available: https: //doi.org/10.1016/j.csbj.2016.12.005
  • [9] D. Jashwanth Reddy, B. Mounika, S. Sindhu, T. Pranayteja Reddy, N. Sagar Reddy, G. Jyothsna Sri, K. Swaraja, K. Meenakshi, and P. Kora, “Predictive machine learning model for early detection and analysis of diabetes,” Materials Today: Proceedings, 2020. [Online]. Available: https://doi.org/10.1016/j.matpr.2020.09.522
  • [10] H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocrine Disorders, vol. 19, no. 1, pp. 1–9, 2019.
  • [11] N. Nai-Arun and R. Moungmai, “Comparison of Classifiers for the Risk of Diabetes Prediction,” Procedia Computer Science, vol. 69, pp. 132–142, 2015. [Online]. Available: http://dx.doi.org/10.1016/j.procs. 2015.10.014
  • [12] Kaggle, “Pima Indians Diabetes Dataset,” 2021. [Online]. Available: https://www.kaggle.com/uciml/pima-indians-diabetes-database
  • [13] G. Swapna, R. Vinayakumar, and K. P. Soman, “Diabetes detection using deep learning algorithms,” ICT Express, vol. 4, no. 4, pp. 243–246, 2018. [Online]. Available: https://doi.org/10.1016/j.icte.2018.10.005
  • [14] Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting diabetes mellitus with machine learning techniques,” Frontiers in Genetics, vol. 9, Nov. 2018. [Online]. Available: https://doi.org/10.3389/fgene. 2018.00515
  • [15] S. Pratama, A. Muda, Y.-H. Choo, and N. Muda, “Computationally in- expensive sequential forward floating selection for acquiring significant features for authorship invarianceness in writer identification,” Interna- tional Journal of New Computer Architectures and their Applications (IJNCAA), vol. 1, pp. 581–598, 01 2011.
  • [16] Y. A. Christobel and P. Sivaprakasam, “A New Classwise k Nearest Neighbor ( CKNN ) Method for the Classification of Diabetes Dataset,” International Journal of Engineering and Advanced Technology, vol. 2, no. 3, pp. 396–400, 2013.
  • [17] Wikipedia, “Support vector machine,” 2021. [Online]. Available: https://en.wikipedia.org/wiki/Support-vector-machine
  • [18] ——, “Random Forest,” 2021. [Online]. Available: https://en.wikipedia. org/wiki/Random forest
  • [19] ——, “Artificial Neural Network,” 2021. [Online]. Available: https: //en.wikipedia.org/wiki/Artificial neural network
  • [20] ——, “Precision and Recall,” 2021. [Online]. Available: https: //en.wikipedia.org/wiki/Precision and recall
  • [21] A. Guha, “Building Explainable and Interpretable model for Diabetes Risk Prediction,” International Journal of Engineering Research and Technology, vol. 9, no. 09, pp. 1037–1042, 2020.
  • [22] A. Kareem, L. Shi, L. Wei, and Y. Tao, “A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach A Comparative Analysis and Risk Prediction of Diabetes at Early Stage using Machine Learning Approach,” International Journal of Future Generation Communication and Networking, vol. 13, no. 3, pp. 4151–4163, 2020.
  • [23] K. Alpan and G. S. Ilgi, “Classification of Diabetes Dataset with Data Mining Techniques by Using WEKA Approach,” in 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). IEEE, oct 2020, pp. 1–7.
  • [24] J. Xue, F. Min, and F. Ma, “Research on diabetes prediction method based on machine learning,” Journal of Physics: Conference Series, vol. 1684, no. 1, 2020.
  • [25] L. Tapak, H. Mahjub, O. Hamidi, and J. Poorolajal, “Real-data compari- son of data mining methods in prediction of diabetes in iran,” Healthcare Informatics Research, vol. 19, no. 3, p. 177, 2013.
  • [26] D. Reddy, B. Mounika, S. Sindhu, T. Reddy, N. Reddy, G. Sri, K. Swaraja, M. Kollati, and P. Kora, “Predictive machine learning model for early detection and analysis of diabetes,” Materials Today: Proceedings, 10 2020.
  • [27] A. Mujumdar and V. Vaidehi, “Diabetes Prediction using Machine Learning Algorithms,” Procedia Computer Science, vol. 165, pp. 292– 299, 2019. [Online]. Available: https://doi.org/10.1016/j.procs.2020.01. 047
  • [28] M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and prediction of diabetes disease using machine learning paradigm,” Health Information Science and Systems, vol. 8, no. 1, Jan. 2020.
  • [29] D. Deng and N. Kasabov, “On-line pattern analysis by evolving self- organizing maps,” Neurocomputing, vol. 51, pp. 87–103, apr 2003.
  • [30] M. Farahmandian, Y. Lotfi, and I. Maleki, “Data Mining Algorithms Application in Diabetes Diseases Diagnosis : A Case Study,” MAGNT Research Report, vol. 3, no. 1, pp. 989–997, 2015.
  • [31] M. Khashei, S. Eftekhari, and J. Parvizian, “Diagnosing diabetes type ii using a soft intelligent binary classification model,” Review of Bioin- formatics and Biometrics, vol. 1, no. 1, pp. 9–23, 2012.
  • [32] N. Nai-arun and R. Moungmai, “Comparison of classifiers for the risk of diabetes prediction,” Procedia Computer Science, vol. 69, pp. 132–142, 2015.
  • [33] H. T. Abbas, L. Alic, M. Erraguntla, J. X. Ji, M. Abdul-Ghani, Q. H. Abbasi, and M. K. Qaraqe, “Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test,” PLOS ONE, vol. 14, no. 12, p. e0219636, Dec. 2019.
APA Buyrukoglu S, Akbaş A (2022). Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. , 110 - 117. 10.17694/bajece.973129
Chicago Buyrukoglu Selim,Akbaş Ayhan Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. (2022): 110 - 117. 10.17694/bajece.973129
MLA Buyrukoglu Selim,Akbaş Ayhan Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. , 2022, ss.110 - 117. 10.17694/bajece.973129
AMA Buyrukoglu S,Akbaş A Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. . 2022; 110 - 117. 10.17694/bajece.973129
Vancouver Buyrukoglu S,Akbaş A Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. . 2022; 110 - 117. 10.17694/bajece.973129
IEEE Buyrukoglu S,Akbaş A "Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS." , ss.110 - 117, 2022. 10.17694/bajece.973129
ISNAD Buyrukoglu, Selim - Akbaş, Ayhan. "Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS". (2022), 110-117. https://doi.org/10.17694/bajece.973129
APA Buyrukoglu S, Akbaş A (2022). Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. Balkan Journal of Electrical and Computer Engineering, 10(2), 110 - 117. 10.17694/bajece.973129
Chicago Buyrukoglu Selim,Akbaş Ayhan Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. Balkan Journal of Electrical and Computer Engineering 10, no.2 (2022): 110 - 117. 10.17694/bajece.973129
MLA Buyrukoglu Selim,Akbaş Ayhan Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. Balkan Journal of Electrical and Computer Engineering, vol.10, no.2, 2022, ss.110 - 117. 10.17694/bajece.973129
AMA Buyrukoglu S,Akbaş A Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. Balkan Journal of Electrical and Computer Engineering. 2022; 10(2): 110 - 117. 10.17694/bajece.973129
Vancouver Buyrukoglu S,Akbaş A Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS. Balkan Journal of Electrical and Computer Engineering. 2022; 10(2): 110 - 117. 10.17694/bajece.973129
IEEE Buyrukoglu S,Akbaş A "Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS." Balkan Journal of Electrical and Computer Engineering, 10, ss.110 - 117, 2022. 10.17694/bajece.973129
ISNAD Buyrukoglu, Selim - Akbaş, Ayhan. "Machine Learning based Early Prediction of Type 2 Diabetes: A New Hybrid Feature Selection Approach using Correlation Matrix with Heatmap and SFS". Balkan Journal of Electrical and Computer Engineering 10/2 (2022), 110-117. https://doi.org/10.17694/bajece.973129