Yıl: 2021 Cilt: 29 Sayı: 7 Sayfa Aralığı: 3165 - 3179 Metin Dili: İngilizce DOI: 10.3906/elk-2102-110 İndeks Tarihi: 28-06-2022

Gene expression data classification using genetic algorithm-based feature selection

Öz:
In this study, hybrid methods are proposed for feature selection and classification of gene expression datasets. In the proposed genetic algorithm/support vector machine (GA-SVM) and genetic algorithm/k nearest neighbor (GA-KNN) hybrid methods, genetic algorithm is improved using Pearson’s correlation coefficient, Relief-F, or mutual information. Crossover and selection operations of the genetic algorithm are specialized. Eight different gene expression datasets are used for classification process. The classification performances of the proposed methods are compared with the traditional GA-KNN and GA-SVM wrapper methods and other studies in the literature. Classification results demonstrate that higher accuracy rates are obtained with the proposed methods compared to the other methods for all datasets.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Kononenko I. Estimating attributes: Analysis and extensions of RELIEF. In: Bergadano F, De Raedt L. (editors). Machine Learning: ECML-94. Berlin, Germany: Springer, 1994, pp. 171-182.
  • [2] Chen XW. Margin-based wrapper methods for gene identification using microarray. Neurocomputing 2006; 69 (16-18): 2236-2243. doi: 10.1016/j.neucom.2005.07.007
  • [3] Alba E, Garcia-Nieto J, Jourdan L, Talbi EG. Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: IEEE Congress on Evolutionary Computation; Singapore, Singapore; 2007. pp. 284-290. doi: 10.1109/CEC.2007.4424483
  • [4] Gunavathi C. Premalatha K. Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification. International Journal of Computer and Information Engineering 2014; 8 (8): 1490- 1497. doi: 10.5281/zenodo.1096103
  • [5] Kar S, Sharma KD, Maitra M. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Systems with Applications 2015; 42 (1): 612-627. doi: 10.1016/j.eswa.2014.08.014
  • [6] Li Y, Wang G, Chen H, Shi L, Qin L. An ant colony optimization based dimension reduction method for highdimensional datasets. Journal of Bionic Engineering 2013; 10: 231-241. doi: 10.1016/S1672-6529(13)60219-X
  • [7] Lee CP, Lin WS, Chen YM, Kuo BJ. Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Systems with Applications 2011; 38 (5): 4661-4667. doi: 10.1016/j.eswa.2010.07.053
  • [8] Arunkumar C, Sooraj MP, Ramakrishnan SMP. Finding expressed genes using genetic algorithm and extreme learning machines. In: International Conference on Advanced Computing and Communication Systems; Coimbatore, India; 2017. pp. 1-4. doi: 10.1109/ICACCS.2017.8014609
  • [9] Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Advances in Bioinformatics 2015. doi: 10.1155/2015/198363
  • [10] Guyon I, Weston J, Barhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning 2002; 46: 389-422. doi: 10.1023/A:1012487302797
  • [11] Mundra PA, Rajapakse JC. SVM-RFE with MR filter for gene selection. IEEE Transactions on Nanobioscience 2010; 9 (1): 31-37. doi: 10.1109/TNB.2009.2035284
  • [12] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 2005; 3 (2): 185-205. doi: 10.1142/S0219720005001004
  • [13] Turgut S, Dağtekin M, Ensari T. Microarray breast cancer data classification using machine learning methods. In: Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT); Istanbul, Turkey; 2018, pp. 1-3. doi: 10.1109/EBBT.2018.8391468
  • [14] Luo K, Wang G, Li Q, Tao J. An improved SVM-RFE based on F-statistic and mPDC for gene selection in cancer classification. IEEE Access 2019; 7: 147617-147628. doi: 10.1109/ACCESS.2019.2946653
  • [15] Shreem SS, Abdullah S, Nazri MZA, Alzaqebah M. Hybridizing reliefF, MRMR filters, and GA wrapper approaches for gene selection. Journal of Theoretical and Applied Information Technology 2012; 46 (2): 1034-1039.
  • [16] Leung Y, Hung Y. A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2010; 7 (1): 108-117. doi: 10.1109/TCBB.2008.46
  • [17] Lee CP, Leu Y. A novel hybrid feature selection method for microarray data analysis. Applied Soft Computing 2011; 11 (1): 208-213. doi: 10.1016/j.asoc.2009.11.010
  • [18] Pragadeesh C, Jeyaraj R, Siranjeevi K, Abishek R, Jeyakumar J. Hybrid feature selection using micro genetic algorithm on microarray gene expression data. Journal of Intelligent & Fuzzy Systems 2019; 36 (3): 2241-2246. doi: 10.3233/JIFS-169935
  • [19] Zhang G, Hou J, Wang J, Yan C. Feature selection for microarray data classification using hybrid information gain and a modified binary krill herd algorithm. Interdisciplinary Sciences: Computational Life Sciences 2020; 12: 288-301. doi: 10.1007/s12539-020-00372-w
  • [20] Wang A, Liu H, Chen G. Chaotic harmony search based multi-objective feature selection for classification of gene expression profiles. In: IEEE 9th International Conference on Bioinformatics and Computational Biology (ICBCB); Taiyuan, China; 2021. pp. 107-112. doi: 10.1109/ICBCB52223.2021.9459222
  • [21] Othman MS, Kumaran SR, Yusuf LM. Gene selection using hybrid multi-objective cuckoo search algorithm with evolutionary operators for cancer microarray data. IEEE Access 2020; 8: 186348-186361. doi: 10.1109/ACCESS.2020.3029890
  • [22] Meenachi L, Ramakrishnan S. Metaheuristic search based feature selection methods for classification of cancer. Pattern Recognition 2021; 119: 108079. doi:10.1016/j.patcog.2021.108079
  • [23] Qaraad M, Amjad S, Manhrawy IIM, Fathi H, Hassan BA et al. A hybrid feature selection optimization model for high dimension data classification. IEEE Access 2021; 9: 42884-42895. doi: 10.1109/ACCESS.2021.3065341
  • [24] Khadijah, Rismiyati, Mantau AJ. Multiclass classification of cancer based on microarray data using extreme learning machine, In: 1st International Conference on Informatics and Computational Sciences; Semarang, Indonesia; 2018. pp. 159-164. doi: 10.1109/ICICOS.2017.8276355
  • [25] Alon U, Barkai N, Notterman DA, Gish K, Ybarra S et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 1999; 96 (12): 6745-6750. doi: 10.1073/pnas.96.12.6745
  • [26] Singh D, Febbo PG, Ross K, Jackson DG, Manola J et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002; 1 (2): 203-209. doi: 10.1016/S1535-6108(02)00030-2
  • [27] Shipp M, Ross K, Tamayo P, Weng A, Kutok J et al. Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nature Medicine 2002; 8: 68-74. doi: 10.1038/nm0102-68
  • [28] Naderi A, Teschendorff AE, Barbosa-Morais NL, Pinder SE, Green AR et al. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene 2007; 26: 1507-1516. doi: 10.1038/sj.onc.1209920
  • [29] Veer LJ, Dai H, Vijver M, He YD, Hart A et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415: 530-536. doi: 10.1038/415530a
  • [30] Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M et al. Prediction of central nervous system embryonal tumor outcome based on gene expression. Nature 2002; 415: 436-442. doi: 10.1038/415436a
  • [31] Yu H, Gu G, Liu H, Shen J, Zhao J. A modified ant colony optimization algorithm for tumor marker gene selection. Genomics, Proteomics & Bioinformatics 2009; 7 (4): 200-208. doi: 10.1016/S1672-0229(08)60050-9
  • [32] Shen Q, Shi WM, Kong W, Ye BX. Combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. Talanta 2007; 71 (4): 1679-1683. doi: 10.1016/j.talanta.2006.07.047
  • [33] Lu H, Chen J, Yan K, Jin Q, Xue Y et al. A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 2017; 256: 56-62. doi: 10.1016/j.neucom.2016.07.080
  • [34] Nikumbh S, Ghosh S, Jayaraman VK. Biogeography-based informative gene selection and cancer classification using SVM and random forests. In: IEEE Congress on Evolutionary Computation; Brisbane, QLD, Australia; 2012. pp. 1-6. doi: 10.1109/CEC.2012.6256127
  • [35] Gao L, Ye M, Lu X, Huang D. Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics, Proteomics & Bioinformatics 2017; 15 (6): 389-395. doi: 10.1016/j.gpb.2017.08.002
  • [36] Peng Y, Li W, Liu Y. A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Informatics 2006; 2: 301-311. doi: 10.1177/117693510600200024
  • [37] Sun L, Zhang X, Xu J, Wang W, Liu R. A Gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered 2018; 9 (1): 144-151. doi: 10.1080/21655979.2017.1403678
  • [38] Baliarsingh SK, Vipsita S, Muhammad K, Bakshi S. Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer. Swarm and Evolutionary Computation 2019; 48: 262- 273. doi: 10.1016/j.swevo.2019.04.010
APA SÖNMEZ Ö, Dagtekin M, Ensari T (2021). Gene expression data classification using genetic algorithm-based feature selection. , 3165 - 3179. 10.3906/elk-2102-110
Chicago SÖNMEZ ÖZNUR SİNEM,Dagtekin Mustafa,Ensari Tolga Gene expression data classification using genetic algorithm-based feature selection. (2021): 3165 - 3179. 10.3906/elk-2102-110
MLA SÖNMEZ ÖZNUR SİNEM,Dagtekin Mustafa,Ensari Tolga Gene expression data classification using genetic algorithm-based feature selection. , 2021, ss.3165 - 3179. 10.3906/elk-2102-110
AMA SÖNMEZ Ö,Dagtekin M,Ensari T Gene expression data classification using genetic algorithm-based feature selection. . 2021; 3165 - 3179. 10.3906/elk-2102-110
Vancouver SÖNMEZ Ö,Dagtekin M,Ensari T Gene expression data classification using genetic algorithm-based feature selection. . 2021; 3165 - 3179. 10.3906/elk-2102-110
IEEE SÖNMEZ Ö,Dagtekin M,Ensari T "Gene expression data classification using genetic algorithm-based feature selection." , ss.3165 - 3179, 2021. 10.3906/elk-2102-110
ISNAD SÖNMEZ, ÖZNUR SİNEM vd. "Gene expression data classification using genetic algorithm-based feature selection". (2021), 3165-3179. https://doi.org/10.3906/elk-2102-110
APA SÖNMEZ Ö, Dagtekin M, Ensari T (2021). Gene expression data classification using genetic algorithm-based feature selection. Turkish Journal of Electrical Engineering and Computer Sciences, 29(7), 3165 - 3179. 10.3906/elk-2102-110
Chicago SÖNMEZ ÖZNUR SİNEM,Dagtekin Mustafa,Ensari Tolga Gene expression data classification using genetic algorithm-based feature selection. Turkish Journal of Electrical Engineering and Computer Sciences 29, no.7 (2021): 3165 - 3179. 10.3906/elk-2102-110
MLA SÖNMEZ ÖZNUR SİNEM,Dagtekin Mustafa,Ensari Tolga Gene expression data classification using genetic algorithm-based feature selection. Turkish Journal of Electrical Engineering and Computer Sciences, vol.29, no.7, 2021, ss.3165 - 3179. 10.3906/elk-2102-110
AMA SÖNMEZ Ö,Dagtekin M,Ensari T Gene expression data classification using genetic algorithm-based feature selection. Turkish Journal of Electrical Engineering and Computer Sciences. 2021; 29(7): 3165 - 3179. 10.3906/elk-2102-110
Vancouver SÖNMEZ Ö,Dagtekin M,Ensari T Gene expression data classification using genetic algorithm-based feature selection. Turkish Journal of Electrical Engineering and Computer Sciences. 2021; 29(7): 3165 - 3179. 10.3906/elk-2102-110
IEEE SÖNMEZ Ö,Dagtekin M,Ensari T "Gene expression data classification using genetic algorithm-based feature selection." Turkish Journal of Electrical Engineering and Computer Sciences, 29, ss.3165 - 3179, 2021. 10.3906/elk-2102-110
ISNAD SÖNMEZ, ÖZNUR SİNEM vd. "Gene expression data classification using genetic algorithm-based feature selection". Turkish Journal of Electrical Engineering and Computer Sciences 29/7 (2021), 3165-3179. https://doi.org/10.3906/elk-2102-110