Yıl: 2020 Cilt: 28 Sayı: 3 Sayfa Aralığı: 1405 - 1421 Metin Dili: İngilizce DOI: 10.3906/elk-1907-46 İndeks Tarihi: 27-05-2020

The impact of text preprocessing on the prediction of review ratings

Öz:
With the increase of e-commerce platforms and online applications, businessmen are looking to have a ratingand review system through which they can easily reveal the feelings of customers related to their products and services. Itis undeniable from the statistics that online ratings and reviews attract new customers as well as increase sales by meansof providing confidence, ratification, opinions, comparisons, merchant credibility, etc. Although considerable researchhas been devoted to the sentiment analysis for review classification, rather less attention has been paid to the textpreprocessing which is a crucial step in opinion mining especially if convenient preprocessing strategies are found out toincrease the classification accuracy. In this paper, we concentrate on the impact of simple text preprocessing decisions inorder to predict fine-grained review rating stars whereas the majority of previous work focused on the binary distinctionof positive vs. negative. Therefore, the aim of this research is to analyze preprocessing techniques and their influence,at the same time explain the interesting observations and results on the performance of a five-class–based review ratingclassifier.
Anahtar Kelime:

Konular: Mühendislik, Elektrik ve Elektronik Bilgisayar Bilimleri, Yazılım Mühendisliği Bilgisayar Bilimleri, Sibernitik Bilgisayar Bilimleri, Bilgi Sistemleri Bilgisayar Bilimleri, Donanım ve Mimari Bilgisayar Bilimleri, Teori ve Metotlar Bilgisayar Bilimleri, Yapay Zeka
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Holleschovsky N, Constantinides E. Impact of online product reviews on purchasing decisions. In: Proceedings of the 12th International Conference on Web Information Systems and Technologies (WEBIST 2016); Rome, Italy; 2016. pp. 271-278.
  • [2] Sharma P, Agrawal A, Alai L, Garg A. Challenges and techniques in preprocessing for twitter data. International Journal of Engineering Science and Computing 2017; 7 (4): 6611-6613.
  • [3] Ghag KV, Shah K. Comparative analysis of effect of stopwords removal on sentiment classification. In: IEEE International Conference on Computer, Communication and Control; Indore, India; 2015. pp. 1-6.
  • [4] Jianqiang Z, Xiaolin G. Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Open Access Journal 2017; 5 (1): 2870-2879. doi: 10.1109/ACCESS.2017.2672677
  • [5] Srividhya V, Anitha R. Evaluating preprocessing techniques in text categorization. International Journal of Computer Science and Application Issue 2010; 47 (11): 49-51.
  • [6] Camacho-Collados J, Pilehvar MT. On the role of text preprocessing in neural network architectures. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP; Brussels, Belgium; 2018. pp. 40–46.
  • [7] Ghag K, Shah K. Optimising sentiment classification using preprocessing techniques. International Journal of IT & Knowledge Management 2015; 8 (2) : 61-70.
  • [8] Jianqiang Z. Pre-processing boosting twitter sentiment analysis? In: IEEE International Conference on Smart City/SocialCom/SustainCom 2015; Chengdu, China; 2015. pp. 748-753.
  • [9] Safeek I, Kalideen MR. Preprocessing on facebook data for sentiment analysis. In: Proceedings of 7th International Symposium on Multidisciplinary Research for Sustainable Development; Oluvil, Sri Lanka; 2015. pp. 69-78.
  • [10] Singh T, Kumari M. The role of text pre-processing in sentiment analysis. In: Twelfth International MultiConference on Information Processing (IMCIP-2016); Procedia Computer Science; Nice, France; 2016. pp. 549-554.
  • [11] Krouska A, Troussas C, Virvou M. The effect of preprocessing techniques on twitter sentiment analysis. In: 7th International Conference on Information, Intelligence, Systems & Applications; Chalkidiki, Greece; 2016. pp. 740- 752.
  • [12] Zin H, Mustapha N, Murad M, Sharef N. The Effects of pre-processing strategies in sentiment analysis of online movie reviews. In: The 2nd International Conference on Applied Science and Technology; Kedah, Malaysia ; 2017. pp. 4575–4587.
  • [13] Pomikalek J, Rehurek R. The influence of preprocessing parameters on text categorization. International Journal of Applied Science Engineering and Technology 2007; 1 (9): 54-57.
  • [14] Schofield A, Magnusson M, Thompson L, Mimno D. Understanding text pre-processing for latent dirichlet allocation. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics; Valencia, Spain; 2017. pp. 432-436.
  • [15] Fan M, Khademi M. Predicting a Business’ Star in Yelp from Its Reviews’ Text Alone. arXiv 2014; arXiv:1401.0864.
  • [16] Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for arabic text. Journal of Information Science 2014; 40 (4) :501-513. doi: 10.1177/0165551514534143
  • [17] Saad M. The impact of text preprocessing and term weighting on arabic text classification. MSc, Computer Engineering Department, The Islamic University, Gaza, 2010.
  • [18] Uysal A, Gunal S. The impact of preprocessing on text classification. Information Processing and Management 2014; 50 (1): 104-112. doi: 10.1016/j.ipm.2013.08.006
  • [19] Shiha M, Ayvaz S. The effects of emoji in sentiment analysis. International Journal of Computer Electrical Engineering 2017; 9 (1): 360-369. doi: 10.17706/IJCEE.2017.9.1.360-369
  • [20] Wegrzyn-Wolska K, Bougueroua L, Yu H, Zhong J. Explore the effects of emoticons on twitter. Computer Science and Information Technology 2016; 6 (1) : 65-77. doi: 10.5121/csit.2016.61006
  • [21] Park Y, Byrd R. Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing; Intelligent Information System Institute; Pittsburgh; 2001. pp. 126-133.
  • [22] Kaur A, Singh P, Rani S. Spell checking and error correcting system for text paragraphs written in punjabi language using hybrid approach. International Journal Of Engineering And Computer Science 2014; 3(9): 8030-8032.
  • [23] Bertoldi N, Cettolo M, Federico M. Statistical Machine translation of texts with misspelled words. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics; Los Angeles, California, USA; 2010. pp. 412-419.
  • [24] Müller P, Ohnheiser I, Olsen S, Reiner F. Multi-word expressions. An International Handbook of the Languages of Europe, Berlin, Germany: HSK series, 2011.
  • [25] Constant M, Sigogne A, Watrin P. Discriminative Strategies to integrate multiword expression recognition and parsing. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics; Jeju Island, Korea; 2012. pp 204-212.
  • [26] Schonlau M, Guenther N, Sucholutsky I. Text mining with ngram variables. The Stata Journal 2017; 17 (4): 866-881. doi: 10.1177/1536867X1801700406
APA IŞIK M, DAĞ H (2020). The impact of text preprocessing on the prediction of review ratings. , 1405 - 1421. 10.3906/elk-1907-46
Chicago IŞIK Muhittin,DAĞ Hasan The impact of text preprocessing on the prediction of review ratings. (2020): 1405 - 1421. 10.3906/elk-1907-46
MLA IŞIK Muhittin,DAĞ Hasan The impact of text preprocessing on the prediction of review ratings. , 2020, ss.1405 - 1421. 10.3906/elk-1907-46
AMA IŞIK M,DAĞ H The impact of text preprocessing on the prediction of review ratings. . 2020; 1405 - 1421. 10.3906/elk-1907-46
Vancouver IŞIK M,DAĞ H The impact of text preprocessing on the prediction of review ratings. . 2020; 1405 - 1421. 10.3906/elk-1907-46
IEEE IŞIK M,DAĞ H "The impact of text preprocessing on the prediction of review ratings." , ss.1405 - 1421, 2020. 10.3906/elk-1907-46
ISNAD IŞIK, Muhittin - DAĞ, Hasan. "The impact of text preprocessing on the prediction of review ratings". (2020), 1405-1421. https://doi.org/10.3906/elk-1907-46
APA IŞIK M, DAĞ H (2020). The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences, 28(3), 1405 - 1421. 10.3906/elk-1907-46
Chicago IŞIK Muhittin,DAĞ Hasan The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences 28, no.3 (2020): 1405 - 1421. 10.3906/elk-1907-46
MLA IŞIK Muhittin,DAĞ Hasan The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences, vol.28, no.3, 2020, ss.1405 - 1421. 10.3906/elk-1907-46
AMA IŞIK M,DAĞ H The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences. 2020; 28(3): 1405 - 1421. 10.3906/elk-1907-46
Vancouver IŞIK M,DAĞ H The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences. 2020; 28(3): 1405 - 1421. 10.3906/elk-1907-46
IEEE IŞIK M,DAĞ H "The impact of text preprocessing on the prediction of review ratings." Turkish Journal of Electrical Engineering and Computer Sciences, 28, ss.1405 - 1421, 2020. 10.3906/elk-1907-46
ISNAD IŞIK, Muhittin - DAĞ, Hasan. "The impact of text preprocessing on the prediction of review ratings". Turkish Journal of Electrical Engineering and Computer Sciences 28/3 (2020), 1405-1421. https://doi.org/10.3906/elk-1907-46