Yıl: 2022 Cilt: 30 Sayı: 3 Sayfa Aralığı: 908 - 926 Metin Dili: İngilizce DOI: 10.3906/elk-2106-55 İndeks Tarihi: 04-07-2022

Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data

Öz:
The massive use of social media causes rapid information dissemination that amplifies harmful messages such as fake news. Fake-news is misleading information presented as factual news that is generally used to manipulate public opinion. In particular, fake news related to COVID-19 is defined as ‘infodemic’ by World Health Organization. An infodemic is a misleading information that causes confusion which may harm health. There is a high volume of misinformation about COVID-19 that causes panic and high stress. Therefore, the importance of development of COVID-19 related fake news identification model is clear and it is particularly important for Turkish language from COVID-19 fake news identification point of view. In this article, we propose an advanced deep language transformer model to identify the truth of Turkish COVID-19 news from social media. For this aim, we first generated Turkish COVID-19 news from various sources as a benchmark dataset. Then we utilized five conventional machine learning algorithms (i.e. Naive Bayes, Random Forest, K-Nearest Neighbor, Support Vector Machine, Logistic Regression) on top of several language preprocessing tasks. As a next step, we used novel deep learning algorithms such as Long ShortTerm Memory, Bi-directional Long-Short-Term-Memory, Convolutional Neural Networks, Gated Recurrent Unit and Bi-directional Gated Recurrent Unit. For further evaluation, we made use of deep learning based language transformers, i.e. Bi-directional Encoder Representations from Transformers and its variations, to improve efficiency of the proposed approach. From the obtained results, we observed that neural transformers, in particular Turkish dedicated transformer BerTURK, is able to identify COVID-19 fake news in 98.5% accuracy.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Paskin D. Real or Fake News: Who Knows?. The Journal of Social Media in Society 2018; 7 (2):252–273.
  • [2] Lazer DMJ, Baum MA, Benkler Y, Berinsky AJ, Greenhill KM et al. The science of fake news. Science 2018; 359 (6380):1094–1096.
  • [3] Hua J,Shaw R. Corona Virus (COVID-19) Infodemic and Emerging Issues through a Data Lens: The Case of China. International Journal of Environmental Research and Public Health 2020; 17(2309):1–11.
  • [4] Sear RF, Velasquez N, Leahy R, Restrepo NJ, Oud SE et al. Quantifying COVID-19 Content in the Online Health Opinion War Using Machine Learning. IEEE Access 2020; 8:91886–91893.
  • [5] Ciampaglia GL. Fighting fake news: a role for computational social science in the fight against digital misinformation. Journal of Computational Social Science 2018; 1 (1):147–153.
  • [6] Lampos V, Majumder MS,Yom-Tov E, Edelstein M, Moura S et al. Tracking COVID-19 using online search. NPJ Digital Medicine 2021; 4 (1):1–11.
  • [7] Beer DB, Matthee M. Approaches to Identify Fake News: A Systematic Literature Review. Lecture Notes in Networks and Systems. Springer International Publishing, 2021.
  • [8] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 2018.
  • [9] Peters ME, Neumann M, Iyyer M, Gardner M, Clark C et al. Deep contextualized word representation. arXiv:1802.05365 2018.
  • [10] Schweter S. BERTurk - BERT models for Turkish 2020. doi:10.5281/zenodo.3770924
  • [11] Vijjali R, Potluri P, Kumar S, Teki S. Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking. arXiv:2011.13253v1 2020.
  • [12] Ferrara E. Disinformation and social bot operations in the run up to the 2017 French presidential election. arXiv:1707.00086 2017; 22 (8).
  • [13] Felber T. Constraint 2021: Machine Learning Models for COVID-19 Fake News Detection Shared Task. arXiv:2101.03717 2021;1–10.
  • [14] Samuel J, Ali GGMN, Rahman MM, Esawi E, Samuel Y. COVID-19 public sentiment insights and machine learning for tweets classification. Information 2020; 11 (6):1–22.
  • [15] Mazzeo V, Rapisarda A,Giuffrida G. Detection of fake news on COVID-19 on Web Search Engines. arXiv:2103.11804v1 2021.
  • [16] Khanday AMUD, Khan QR, Rabani ST. Identifying propaganda from online social networks during COVID-19 using machine learning techniques. International Journal of Information Technology 2021; 13 (1):115–122.
  • [17] Shams AB, Apu EH, Rahman A, Raihan MMS, Siddika N et al. Web search engine misinformation notifier extension (Seminext): A machine learning based approach during covid-19 pandemic. Healthcare 2021; 9 (2).
  • [18] Rustam F, Khalid M, Aslam W, Rupapara V, Mehmood A et al. A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE 2021; 16 (2):1–23.
  • [19] Granik M, Mesyura V. Fake news detection using naive Bayes classifier. In: 2017 IEEE 1st Ukraine Conference on Electrical and Computer Engineering, 2017, pp:900–903.
  • [20] Choudrie J, Banerjee , Kotecha K, Walambe R, Karende H et al. Machine learning techniques and older adults processing of online information and misinformation: A covid 19 study. Computers in Human Behavior 2021; 119:1–11.
  • [21] Shifath SMS, Khan MF, Islam MS. A transformer based approach for fighting COVID-19 fake news. arxiv:2101.12027v1 2021; 1–9.
  • [22] Wani A, Joshi I, Khandve S, Wagh V, Joshi R. Evaluating Deep Learning Approaches for Covid-19 Fake News Detection. arXiv:2101.04012 2021;153–163.
  • [23] Shu K, Wang S, Liu H. Exploiting Tri-Relationship for Fake News Detection Network representation learning View project Feature engineering for outlier detection View project. arxiv:1712.07709v1 2017.
  • [24] Raha T, Indurthi V, Upadhyaya A, Kataria J, Bommakanti P et al. Identifying COVID-19 Fake News in Social Media. arXiv:2101.11954v2 2021.
  • [25] Chintalapudi N, Battineni G, Amenta F. Sentimental analysis of COVID-19 tweets using deep learning models. Infectious Disease Reports 2021;13 (2):329–339.
  • [26] Mookdarsanit P and Mookdarsanit L. The COVID-19 fake news detection in Thai social texts. Bulletin of Electrical Engineering and Informatics 2021; 10 (2):988–998.
  • [27] Mertoğlu U and Genç B. Automated Fake News Detection in the Age of Digital Libraries. Information Technology and Libraries 2020; 39 (4).
  • [28] Chauhan NK and Singh K. A Review on Conventional Machine Learning vs Deep Learning. In: 2018 International Conference on Computing, Power and Communication Technologies, 2018.
  • [29] Mertoğlu U, Sever H and Genç B. Savunmada Yenilikçi bir Dijital Dönüşüm Alanı: Sahte Haber Tespit Modeli. In:9. Savunma Teknolojileri Kongresi,2018 (in Turkish with an abstract in English).
  • [30] Mertoğlu U, Genç B and Sever H. Text-Based Fake News Detection via Machine Learning. In: The International Conference on Artificial Intelligence and Applied Mathematics in Engineering,2020,pp:113-124.
  • [31] Taşkın SG.Detecting fake news in Turkish with deep learning algorithms. PhD, Süleyman Demirel University, Isparta, Turkey,2020.
  • [32] Özbay FA. Fake News Detection in Online Social Networks Using Swarm Intelligence Based Methods. PhD, Fırat University, Elazığ,Turkey,2020.
  • [33] Mertoğlu U. Fake News Detection Model For Turkish Language. PhD, Hacettepe University, Ankara, 2020.
  • [34] Vasilakos C, Kavroudakis D, and Georganta A. Machine learning classification ensemble of multitemporal Sentinel-2 images: The case of a mixed mediterranean ecosystem. Remote Sensing 2020; 12 (12).
  • [35] Patwa P, Sharma S,Pykl S,Guptha V, Kumari G et al. Fighting an Infodemic: COVID-19 Fake News Dataset. arXiv:2011.03327 2020.
  • [36] Ginn R, Pimpalkhute P, Nikfarjam A, Patki MSA, O’Conner K et al. Mining Twitter for Adverse Drug Reaction Mentions: A Corpus and Classification Benchmark. In: BIOTXTM’14, 2014.
  • [37] Şahin DÖ, Kılıç E. Two new feature selection metrics for text classification. Automatika 2019; 60 (2):162–171.
  • [38] Borandağ E, Özçift A, Kaygusuz Y. Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization. Turkish Journal of Electrical Engineering & Computer Sciences 2021; 29 (2):514–530.
  • [39] Lino FSB, Oliveira MHA, Souza GMF, Rocha LS, Oliveira EL et al. Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis. Informatics 2021; 8 (2).
  • [40] Shrestha A, Mahmood A. Review of deep learning algorithms and architectures. IEEE Access 2019; 7:53040–53065.
  • [41] Jang B, Kim M, Harerimana G, Kang SU, Kim JW. Bi-LSTM model to increase accuracy in text classification: Combining word2vec CNN and attention mechanism. Applied Sciences 2020; 10 (17).
  • [42] Amin MZ, Nadeem N. Convolutional Neural Network: Text Classification Model for Open Domain Question Answering System. arXiv:1809.02479, 2018.
  • [43] Zhou L, Bian X. Improved text sentiment classification method based on BiGRU-Attention. Journal of Physics: Conference Series 2019; 1345 (3).
  • [44] Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: EMC2: 5th Edition Co-located with NeurIPS’19, 2019.
  • [45] Liu Y, Ott M, Goyal N, Du J, Joshi M et al. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  • [46] Santos DBL, Dutra FGC, Parreiras FS, Brandao WC. Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese. In: 23rd International Conference on Enterprise Information Systems (ICEIS 2021), 1,2021, pp:473-483
  • [47] Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 2020; 21 (1):1–13.
  • [48] Wardhani NWS, Rochayani MY, Iriany A, Sulistyono AD, Lestantyo P. Cross-validation Metrics for Evaluating Classification Performance on Imbalanced Data. In: 2019 International Conference on Computer, Control, Informatics and its Applications, 2019, pp:14–18.
APA bozuyla m, özçift a (2022). Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. , 908 - 926. 10.3906/elk-2106-55
Chicago bozuyla mehmet,özçift akın Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. (2022): 908 - 926. 10.3906/elk-2106-55
MLA bozuyla mehmet,özçift akın Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. , 2022, ss.908 - 926. 10.3906/elk-2106-55
AMA bozuyla m,özçift a Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. . 2022; 908 - 926. 10.3906/elk-2106-55
Vancouver bozuyla m,özçift a Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. . 2022; 908 - 926. 10.3906/elk-2106-55
IEEE bozuyla m,özçift a "Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data." , ss.908 - 926, 2022. 10.3906/elk-2106-55
ISNAD bozuyla, mehmet - özçift, akın. "Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data". (2022), 908-926. https://doi.org/10.3906/elk-2106-55
APA bozuyla m, özçift a (2022). Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. Turkish Journal of Electrical Engineering and Computer Sciences, 30(3), 908 - 926. 10.3906/elk-2106-55
Chicago bozuyla mehmet,özçift akın Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. Turkish Journal of Electrical Engineering and Computer Sciences 30, no.3 (2022): 908 - 926. 10.3906/elk-2106-55
MLA bozuyla mehmet,özçift akın Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. Turkish Journal of Electrical Engineering and Computer Sciences, vol.30, no.3, 2022, ss.908 - 926. 10.3906/elk-2106-55
AMA bozuyla m,özçift a Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. Turkish Journal of Electrical Engineering and Computer Sciences. 2022; 30(3): 908 - 926. 10.3906/elk-2106-55
Vancouver bozuyla m,özçift a Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data. Turkish Journal of Electrical Engineering and Computer Sciences. 2022; 30(3): 908 - 926. 10.3906/elk-2106-55
IEEE bozuyla m,özçift a "Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data." Turkish Journal of Electrical Engineering and Computer Sciences, 30, ss.908 - 926, 2022. 10.3906/elk-2106-55
ISNAD bozuyla, mehmet - özçift, akın. "Developing a fake news identification model with advanced deep language transformers for Turkish COVID-19 misinformation data". Turkish Journal of Electrical Engineering and Computer Sciences 30/3 (2022), 908-926. https://doi.org/10.3906/elk-2106-55