Yıl: 2018 Cilt: 24 Sayı: 2 Sayfa Aralığı: 283 - 291 Metin Dili: Türkçe DOI: 10.5505/pajes.2017.50480 İndeks Tarihi: 29-08-2019

Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi

Öz:
Terim ağırlıklandırma, metin sınıflandırmada sonuçlar üzerindedoğrudan etkili olan önemli bir adımdır. Ancak, bir metin sınıflandırmaproblemi olarak ele alınan duygu analizinde farklı önişlemetekniklerine bağlı olarak ağırlıklandırma yönteminin davranışıdeğişebilmektedir. Bu çalışmada bilgi getirimi, metin sınıflandırma,doküman filtreleme gibi farklı çalışma alanları için yakın zamandaönerilen yöntemler Twitter duygu analizinde uygulanmış ve sonuçlarüzerindeki etkisi incelenmiştir. Öznitelikler çıkarılırken kelime torbası(BoW) ve karakter seviye N-gram olmak üzere iki farklı modelkullanılmıştır. Deneyler Türkçe ve İngilizce Twitter mesajlarındanoluşan veri kümeleri üzerinde uygulanmıştır. Twitter mesajlarınınduygu sınıflandırması, Gizli Dirichlet Ataması (LDA) tabanlı konumodeli ile gerçekleştirilmiştir. Sınıflandırma aşamasında ise DestekVektör Makinesi (SVM) algoritması kullanılmıştır. Deneysel sonuçlaragöre, Twitter duygu analizi çalışmalarında kullanılabilecek en etkiliterim ağırlıklandırma yöntemi önerilmiştir.
Anahtar Kelime:

The impact of term weighting method on Twitter sentiment analysis

Öz:
Term weighting is an important step which has direct impact on the result in classical text classification. However, the behavior of the term weighting method may vary depending on different preprocessing techniques in sentiment analysis which considered as a text classification task. In this study, term weighted methods which are newly proposed for various research areas such as information retrieval, text classification and document filtering, performed to investigate effect on results for Twitter sentiment analysis. In feature extraction phase, two different models are used including Bag of Words (BoW) and character level N-gram. The experiments conducted on data sets consist of Turkish and English Twitter feeds. Sentiment classification of Twitter feeds performed using topic model generated with Latent Dirichlet Allocation (LDA) method. The Support Vector Machine (SVM) algorithm is employed in the classification stage. According to the experimental results, the most effective term weighting method that can be used in Twitter sentiment analysis studies is suggested.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Derleme Erişim Türü: Erişime Açık
  • Patra A, Singh D. “A survey report on text classification with different term weighing methods and comparison between classification algorithms”. International Journal of Computer Applications, 75(7), 2013.
  • Prabowo R, Thelwall M. “Sentiment analysis: A combined approach”. Journal of Informetrics, 3(2), 143-157, 2009.
  • Paltoglou G, Thelwall M. “A study of information retrieval weighting schemes for sentiment analysis”. 48 th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, USA, 11-16 July 2010.
  • Çetin M, Amasyalı MF. “Supervised and traditional term weighting methods for sentiment analysis”. In Signal Processing and Communications Applications Conference (SIU), Girne, KKTC, 24-26 April 2013.
  • Aizawa A. “An information-theoretic perspective of tf–idf measures”. Information Processing & Management, 39(1), 45-65, 2003.
  • Salton G, Buckley C. “Term-weighting approaches in automatic text retrieval”. Information processing & management, 24(5), 513-523, 1988.
  • Robertson S, Zaragoza H, Taylor M. “Simple BM25 extension to multiple weighted fields”. 13 th ACM International Conference on Information and Knowledge Management, New York, USA, 08-13 November 2004.
  • Lan M, Tan CL, Low HB. “Proposing a new term weighting scheme for text categorization”. Association for the Advancement of Artificial Intelligence, Boston, USA, 16-20 June 2006.
  • Reed JW, Jiao Y, Potok TE, Klump BA, Elmore MT, Hurson A R. “TF-ICF: A new term weighting scheme for clustering dynamic data streams”. In ICMLA'06. 5 th International Conference on Machine Learning and Applications, Florida, USA, 14-16 December 2006.
  • Polettini N. “The vector space model in information retrieval-term weighting problem”. Entropy, 1-9, 2004.
  • Chen LS, Chang CW. “A new term weighting method by introducing class information for sentiment classification of textual data”. International Multi Conference of Engineers and Computer Scientists, Hong Kong, China, 16-18 March 2011.
  • Deng ZH, Luo KH, Yu HL. “A study of supervised term weighting scheme for sentiment analysis”. Expert Systems with Applications, 41(7), 3506-3513, 2014.
  • Gasanova T, Sergienko R, Akhmedova S, Semenkin E, Minker W. “Opinion mining and topic categorization with novel term Weighting”. 5 th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Baltimore, Maryland, USA, 27 June 2014.
  • Jung Y, Park H, Du D. “A Balanced term-weighting scheme for improved document comparison and classification”. Preprint, 2001.
  • Kansheng SHI, Jie HE, Liu HT, Zhang NT, Song WT. “Efficient text classification method based on improved term reduction and term weighting”. The Journal of China Universities of Posts and Telecommunications, 18(1), 131-135, 2011.
  • Liu Y, Loh H. T, Sun A. “Imbalanced text classification: A term weighting approach”. Expert Systems With Applications, 36(1), 690-701, 2009.
  • Deng ZH, Tang SW, Yang DQ, Li MZLY, Xie KQ. “A comparative study on feature weight in text categorization”. In Advanced Web Technologies and Applications, Hangzhou, China, 14-17 April 2004.
  • Mladenić D, Brank J, Grobelnik M, Milic-Frayling N. “Feature selection using linear classifier weights: interaction with classification models”. 27 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, 25-29 July 2004.
  • Debole, F, Sebastiani F. Supervised Term Weighting for Automated Text Categorization. Editor(s): Spiros S. Text Mining and its Applications, 81-97, Germany, Berlin Heidelberg, Springer, 2004.
  • Quan X, Wenyin L, Qiu B. “Term weighting schemes for question categorization”. Pattern Analysis and Machine Intelligence, 33(5), 1009-1021, 2011.
  • Go A, Bhayani R, Huang L. “Twitter Sentiment Classification Using Distant Supervision”. Stanford University, California, USA, Project Report, CS224N, 2009.
  • Srividhya V, Anitha R. “Evaluating preprocessing techniques in text categorization”. International Journal of Computer Science and Application, 47(11), 2010.
  • Brücher H, Knolmayer G, Mittermayer MA. “Document classification methods for organizing explicit knowledge”. University of Bern, Switzerland, Technical Report, 140, 2002.
  • Coban O, Ozyer B, Ozyer G. T. “A comparison of similarity metrics for sentiment analysis on Turkish twitter feeds”. International Conference on SocialCom, Chengdu, China, 19-21 December, 2015.
  • Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R. “Sentiment analysis of twitter data”. In Proceedings of the Workshop on Languages in Social Media, Portland, Oregon, USA, 23 June 2011.
  • Kouloumpis E, Wilson T, Moore JD. “Twitter sentiment analysis: The good the bad and the omg!”. International Conference on Web and Social Media, Barcelona, Catalonia, Spain, 17-21 July 2011.
  • Kaya M, Fidan G, Toroslu I. H. “Sentiment analysis of turkish political news”. International Joint Conferences on Web Intelligence and Intelligent Agent Technology, Macau, China, 4-7 December 2012.
  • Walsh, B. “Markov chain monte carlo and gibbs sampling”. University of Sao Paulo, Brazil, Lecture Notes for EBB 581, 2004.
  • Blei DM, Ng AY, Jordan MI. “Latent dirichlet allocation”. The Journal of machine Learning research, 3, 993-1022, 2003.
  • Çoban Ö, Özyer G. T. “Sentiment classification for Turkish twitter feeds using LDA”. 24 th IEEE Signal Processing and Communications Applications Conference (SIU), Zonguldak, Turkey, 16-19 May 2016.
  • Salton G, Wong A, Yang CS. “A vector space model for automatic indexing”. Communications of the ACM, 18(11), 613-620, 1975.
  • Lewis DD. “An evaluation of phrasal and clustered representations on a text categorization task”. 15 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, 21-24 June 1992.
  • Akın AA, Akın MD. “Zemberek, an open source NLP framework for Turkic Languages”. Structure, 10, 1-5, 2007.
  • Porter MF. “An algorithm for suffix stripping”. Program, 14(3), 130-137, 1980.
  • Kanaris I, Kanaris K, Houvardas I, Stamatatos E. “Words versus character n-grams for anti-spam filtering”. International Journal on Artificial Intelligence Tools, 16(06), 1047-1067, 2007.
  • Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. “Text classification using string kernels”. The Journal of Machine Learning Research, 2, 419-444, 2002.
  • Manning C. D, Raghavan P, Schütze H. Introduction to Information Retrieval. Online Edition, Cambridge, United Kingdom, Cambridge University Press, 2008.
  • Xu H, Li C. “A Novel term weighting scheme for automated text categorization”. 7 th International Conference on Intelligent Systems Design Applications, Rio de Janeiro, Brazil, 22-24 October 2007.
  • Nanas N, Uren V, De Roeck A. “A comparative evaluation of term weighting methods for information filtering”. 15 th International Workshop on Database and Expert Systems Applications, Zaragoza, Spain, 3-3 September 2004.
  • Bun KK, Ishizuka M. “Topic extraction from news archive using TF*PDF algorithm”. In Proceedings of the Third International Conference on Web Information Systems Engineering, Singapore, 14 December, 2002.
  • De Silva J, Haddela P. S. December. “A term weighting method for identifying emotions from text content”. 2013 International Industrial and Information Systems (ICIIS) Conference, Peradeniya, Sri Lanka, 17-20 December 2013.
  • Liu M, Yang J. “An improvement of TFIDF weighting in text categorization”. International Proceedings of Computer Science and Information Technology, IACSIT Press, Singapore, 2012.
  • Soucy P, Mineau G. W. “Beyond TFIDF weighting for text categorization in the vector space model”. International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, July 30-August 5, 2005.
  • Ren F, Sohrab MG. “Class-indexing-based term weighting for automatic text classification”. Information Sciences, 236, 109-125, 2013.
  • Srividhya V, Anitha R. “Evaluating preprocessing techniques in text categorization”. International Journal of Computer Science and Application, 2010, 49-51, 2010.
  • Cortes C, Vapnik V. “Support-vector networks”. Machine learning, 20(3), 273-297, 1995.
  • Burges C. J. “A tutorial on support vector machines for pattern recognition”. Data mining and knowledge discovery, 2(2), 121-167, 1998.
  • Gunn S. R. “Support Vector Machines for Classification and Regression”. Department of Science and Mathematics Engineering, University of Southampton, Southampton, UK, ISIS Technical Report, 14, 1998.
  • Fradkin D, Muchnik I. “Support vector machines for classification”. Discrete Methods in Epidemiology, 70, 13-20, 2006.
  • Chang CC, Lin CJ. “LIBSVM: A library for support vector machines”. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 2011.
  • Kohavi R. “A study of cross-validation and bootstrap for accuracy estimation and model selection”. International Joint Conference on Artificial Intelligence, Quebec, Canada, 20-25 August 1995.
  • Jones KS, Walker S, Robertson SE. “A probabilistic model of information retrieval: development and comparative experiments”. Information Processing & Management, 36(6), 809-840, 2000.
  • Sheela LJ. “A Review of Sentiment Analysis in Twitter Data Using Hadoop”. International Journal of Database Theory and Application, 9(1), 77-86, 2016.
APA ÇOBAN Ö, TÜMÜKLÜ OZYER G (2018). Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. , 283 - 291. 10.5505/pajes.2017.50480
Chicago ÇOBAN ÖNDER,TÜMÜKLÜ OZYER GÜLSAH Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. (2018): 283 - 291. 10.5505/pajes.2017.50480
MLA ÇOBAN ÖNDER,TÜMÜKLÜ OZYER GÜLSAH Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. , 2018, ss.283 - 291. 10.5505/pajes.2017.50480
AMA ÇOBAN Ö,TÜMÜKLÜ OZYER G Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. . 2018; 283 - 291. 10.5505/pajes.2017.50480
Vancouver ÇOBAN Ö,TÜMÜKLÜ OZYER G Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. . 2018; 283 - 291. 10.5505/pajes.2017.50480
IEEE ÇOBAN Ö,TÜMÜKLÜ OZYER G "Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi." , ss.283 - 291, 2018. 10.5505/pajes.2017.50480
ISNAD ÇOBAN, ÖNDER - TÜMÜKLÜ OZYER, GÜLSAH. "Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi". (2018), 283-291. https://doi.org/10.5505/pajes.2017.50480
APA ÇOBAN Ö, TÜMÜKLÜ OZYER G (2018). Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(2), 283 - 291. 10.5505/pajes.2017.50480
Chicago ÇOBAN ÖNDER,TÜMÜKLÜ OZYER GÜLSAH Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24, no.2 (2018): 283 - 291. 10.5505/pajes.2017.50480
MLA ÇOBAN ÖNDER,TÜMÜKLÜ OZYER GÜLSAH Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol.24, no.2, 2018, ss.283 - 291. 10.5505/pajes.2017.50480
AMA ÇOBAN Ö,TÜMÜKLÜ OZYER G Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018; 24(2): 283 - 291. 10.5505/pajes.2017.50480
Vancouver ÇOBAN Ö,TÜMÜKLÜ OZYER G Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018; 24(2): 283 - 291. 10.5505/pajes.2017.50480
IEEE ÇOBAN Ö,TÜMÜKLÜ OZYER G "Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi." Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24, ss.283 - 291, 2018. 10.5505/pajes.2017.50480
ISNAD ÇOBAN, ÖNDER - TÜMÜKLÜ OZYER, GÜLSAH. "Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi". Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24/2 (2018), 283-291. https://doi.org/10.5505/pajes.2017.50480