Yıl: 2017 Cilt: 21 Sayı: 1 Sayfa Aralığı: 190 - 200 Metin Dili: İngilizce DOI: 10.19113/sdufbed.20964 İndeks Tarihi: 21-11-2018

Effects of Feature Extraction and Classification Methods on Cyberbully Detection

Öz:
Cyberbullying is defined as an aggressive, intentional action against a defenseless person by using the Internet, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended in suicides; hence automatic detection of cyberbullying has become important. In this study we show the effects of feature extraction, feature selection, and classification methods that are used, on the performance of automatic detection of cyberbullying. To perform the experiments FormSpring.me dataset is used and the effects of preprocessing methods; several classifiers like C4.5, Naïve Bayes, kNN, and SVM; and information gain and chi square feature selection methods are investigated. Experimental results indicate that the best classification results are obtained when alphabetic tokenization, no stemming, and no stopwords removal are applied. Using feature selection also improves cyberbully detection performance. When classifiers are compared, C4.5 performs the best for the used dataset.
Anahtar Kelime:

Sanal Zorbalık Tespitinde Nitelik Çıkarımı ve Sınıflama Yöntemlerinin Etkileri

Öz:
İnternet ya da diğer elektronik içerikleri kullanarak savunmasız kişilere karşı yapılan hakaretler sanal zorbalık olarak adlandırılmaktadır. Sanal zorbalık konusunda yapılan çalışmalar, bu hakaretlerin özellikle ergen yaş grubundaki gençler için intihara kadar sonuçlanan etkilerinin olduğunu göstermektedir. Bu sebeple sanal zorbalığın otomatik tespiti oldukça önemlidir. Bu çalışmada nitelik çıkarımı, nitelik seçimi ve sınıflama yöntemlerinin otomatik sanal zorbalık tespiti üzerindeki etkileri gösterilmektedir. Deneyler FormSpring.me veri kümesi üzerinde yapılmış ve önişleme yöntemlerinin; C4.5, Naive Bayes, kNN ve SVM gibi farklı sınıflayıcıların; bilgi kazancı ve ki kare nitelik seçim yöntemlerinin etkileri araştırılmıştır. Deneysel sonuçlar, en iyi sınıflandırma performansının alfabetik karakterlerin alındığı, durma kelimelerinin silinmediği ve kelime köklerine ayırma işleminin yapılmadığı durumlarda elde edildiğini göstermiştir. Nitelik seçimi sınıflandırma performansını arttırmıştır. Kullanılan sınıflayıcılar karşılaştırıldığında C4.5, kullanılan veri kümesi için en iyi yöntem olmuştur.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Manning, C. D., Raghavan, P., and Schütze, H., 2008. Introduction to Information Retrieval, Cambridge, UK: Cambridge University Press.
  • Han, J., and Kamber, M. 2001. Data Mining: Concepts and Techniques (Morgan‐Kaufman Series of Data Management Systems). San Diego: Academic Press.
  • WEKA. http://www.cs.waikato.ac.nz/~ml/weka/ (Access Date: 12.05.2015)
  • Mitchell, T. M. 1997. Machine Learning. First Edition. McGraw‐Hill, New York, 432 p.
  • Yates, F., 1934. Contingency Tables Involving Small Numbers and the χ2 Test. Supplement to the Journal of the Royal Statistical Society, 1(1934), 217‐235
  • Saraç, E., and Özel, S.A. 2014. An Ant Colony Optimization based Feature Selection for Web Page Classification. The Scientific World Journal (2014), Article ID 649260, http://dx.doi.org/10.1155/2014/649260
  • Saraç, E., and Özel, S.A. 2013. Web Page Classification Using Firefly Optimization. IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA 2013), June 19‐21, Albena, Bulgaria.
  • Özel, S.A., and Saraç, E. 2011. Feature Selection for Web Page Classification Using the Intelligent Water Drop Algorithm. 2nd World Conference on Information Technology (WCIT 2011), November 23‐26, Antalya, Türkiye.
  • Salton, G. 1968. Automatic Information Organization and Retrieval. New York: McGraw‐ Hill.
  • Chakrabarti, S. 2002. Mining the Web. Morgan Kaufman.
  • Liu, B. 2011. Web Data Mining. Second Edition. Springer‐Verlag Berlin Heidelberg.
  • Porter, M.F. 1980. An Algorithm for Suffix Stripping. Program, 14(3)(1980),130‐137.
  • Butler, D., Kift, S., and Campbell, M. 2009. Cyber Bullying in Schools and the Law: Is There an Effective Means of Addressing the Power Imbalance? eLaw Journal: Murdoch University Electronic Journal of Law, 16(2009).
  • Kidswatch. http://www.kidswatch.com/ (Access Date: 10.05.2014)
  • IamBigBrother. http://www.iambigbrother.com/ (Access Date: 10.05.2014)
  • eBlaster. Available: http://www.eblaster.com/ (Access Date: 10.05.2014)
  • Cyber Patrol. http://www.cyberpatrol.com/cpparentalcontro ls.asp (Access Date: 10.05.2014)
  • Bsecure. http://www.safesearchkids.com/BSecure.html (Access Date: 10.05.2014)
  • Zubiaga, A., Spina, D., Martínez, R., and Fresno, V. 2015. Real‐Time Classification of Twitter Trends. Journal of the Association for Information Science and Technology, 66(3) (2015), 462–473.
  • Simanjuntak, D. A., and Ipung, H. P. 2010. Text Classification Techniques Used to Facilitate Cyber Terrorism Investigation. Second International Conference on Advances in Computing, Control and Telecommunication Technologies (ACT), 198‐200.
  • Tan, P. N., Chen, F., and Jain, A. 2010. Information Assurance: Detection of Web Spam Attacks in Social Media. 27th Army Science Conference, Florida.
  • Smets, K., Goethals, B., and Verdonk, B. 2008. Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach, Wikipedia and Artificial Intelligence: an Evolving Synergy (WikiAi08) Workshop by Association for the Advancement of Artificial Intelligence, 43–48.
  • McGhee, I., Bayzick, J., Kontostathis, A., Edwards, L., McBride, A., and Jakubowski, E. 2011. Learning to Identify Internet Sexual Predation. International Journal of Electronic Commerce 15(2011), 103–122.
  • Kontostathis, A., Edwards, L., and Leatherman, A. 2009. ChatCoder: Toward the Tracking and Categorization of Internet Predators. Text Mining Workshop held in conjunction with the Ninth SIAM International Conference on Data Mining (SDM 2009), May 2, Sparks, NV.
  • Munezero, M., Mozgovoy, M., Kakkonen, T., Klyuev, V., and Sutinen, E. 2013. Antisocial Behavior Corpus for Harmful Language Detection. Federated Conference on Computer Science and Information Systems, September 8‐ 11, Krakow, Poland.
  • Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3 (2003), 993‐1022.
  • Nahar, V., Unankard, S., Li, X., and Pang, C. 2012. Sentiment Analysis for Effective Detection of Cyber Bullying. 14th Asia‐Pacific International Conference on Web Technologies and Applications (APWeb 2012), April 11‐13, Kunming, China, 767‐774.
  • Xu, J., Jun, K., Zhu, X., and Bellmore, A. 2012. Learning from Bullying Traces in Social Media. Conference of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), June 03 –08, Montreal, Canada, 656‐666.
  • Dadvar, M., Trieschnigg, D., and Jong, F. D. 2013. Expert Knowledge for Automatic Detection of Bullies in Social Networks. 25th Benelux Conference on Artificial Intelligence (BNAIC), November 7‐8, Delft.
  • Dadvar, M., and Jong F. D. 2012. Improved Cyberbullying Detection through Personal Profiles. International Conference on Cyberbullying, June 27‐30, Paris, France.
  • Dadvar, M., Jong, F. D., Ordelman, R., and Trieschnigg, D. 2012. Improved Cyberbullying Detection Using Gender Information. Twelfth Dutch‐Belgian Information Retrieval Workshop (DIR 2012), February 23‐24, Ghent, Belgium, 23‐25.
  • Sanchez, H., and Kumar, S. 2011. Twitter Bullying Detection. UCSC ISM245 Data Mining course report.
  • Dinakar, K., Reichart, R., and Lieberman, H. 2011. Modelling the Detection of Textual Cyberbullying. Social Mobile Web Workshop at International Conference on Weblog and Social Media, July 17‐21, Barcelona, Spain.
  • Reynolds, K., Kontostathis, A., and Edwards, L. 2011. Using Machine Learning to Detect Cyberbullying. 10th International Conference on Machine Learning and Applications and Workshops (ICMLA '11), December 18 ‐ 21, Washington, DC, vol:2, 241–244.
  • Kontostathis, A., Edwards, L., and Leatherman, A. 2010. Text Mining and Cybercrime. Berry, M. W., and Kogan, J., ed. 2010. Text Mining: Applications and Theory, John Wiley and Sons, New York, NY.
  • Chen, Y., Zhou, Y., Zhu, S., and Xu, H. 2012. Detecting Offensive Language in Social Media to Protect Adolescent Online Safety. 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust (SOCIALCOM‐PASSAT '12), Washington, DC, USA, 71‐80.
  • Cambria, E., Chandra, P., Sharma, A., and Hussain A. 2010. Do not Feel the Trolls. 3rd International Workshop on Social Data on the Web (SDoW), co‐located with the 9th International Semantic Web Conference (ISWC2010), Nov 7, Shanghai.
  • Yin, D., Xue, Z., Hong, L., Davison, B. D., Kontostathis, A., and Edwards, L.. 2009. Detection of Harassment on Web 2.0. The Content Analysis in the WEB 2.0 (CAW2.0) Workshop at WWW2009, April 20‐24, Madrid, Spain.
  • Özdemir, Y. 2014. Cyber Victimization and Adolescent Self‐esteem: The Role of Communication with Parents: Cyber Victimization and Self‐esteem. Asian Journal of Social Psychology 17(2014), 255–263.
  • Barlett, C., and Coyne, S.M. 2014. A Meta Analysis of Sex Differences in Cyber‐Bullying Behavior: The Moderating Role of Age: Sex Differences in Cyber‐Bullying. Aggressive Behavior 40(2014), 474–488.
  • What is Cyber Bullying? http://www.stopcyberbullying.org/ (Access Date: 15.12.2015)
  • Campbell, M.A. 2005. Cyber Bullying: An Old Problem in a New Guise? Australian Journal of Guidance and Counselling 15(2005), 68–76.
  • Rivers, I., and Noret, N. 2010. “I h8 u”: Findings from a Five‐year Study of Text and Email Bullying. British Educational Research Journal, 36(2010), 643‐671.
  • Patchin, J.W., and Hinduja, S. 2006. Bullies Move Beyond the Schoolyard: A Preliminary Look at Cyberbullying. Youth Violence and Juvenile Justice 4(2006), 148–169.
  • Ortega, R., Elipe, P., Mora‐Merchán, J. A., Calmaestra, J., and Vega, E. 2009. The Emotional Impact on Victims of Traditional Bullying and Cyberbullying: A study of Spanish Adolescents. Zeitschrift Für Psychologie/Journal of Psychology, 217(4)(2009), 197‐204.
  • Kowalski, R. M., and Limber, S. P. 2007. Electronic Bullying among Middle School Students. Journal of Adolescent Health, 41(6, Suppl. 1)(2007), 22‐30.
  • Hinduja, S., and Patchin, J. W. 2008. Cyberbullying: An Exploratory Analysis of Factors Related to Offending and Victimization. Deviant Behavior, 29(2008), 129‐156.
  • Beran, T., and Li, Q. 2005. Cyber‐harassment: A Study of a New Method for an old Behavior. Journal of Educational Computing Research, 32(2005), 265‐277.
  • Agatston, P.W., Kowalski, R., and Limber, S. 2007. Students’ Perspectives on Cyber Bullying. Journal of Adolescent Health 41(2007), S59–S60.
  • Li, Q. 2006. Cyberbullying in Schools: A Research of Gender Differences. School Psychology International, 27(2)(2006), 157‐ 170.
  • Smith, P.K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., and Tippett, N. 2008. Cyberbullying: its Nature and Impact in Secondary School Pupils. Journal of Child Psychology and Psychiatry 49(2008), 376–385.
  • Snakenborg, J., Van Acker, R., and Gable, R. A. 2011. Cyberbullying: Prevention and Intervention to Protect our Children and Youth. Preventing School Failure: Alternative Education for Children and Youth, 55(2011), 88‐95.
APA SARAÇ EŞSİZ E, Özel S (2017). Effects of Feature Extraction and Classification Methods on Cyberbully Detection. , 190 - 200. 10.19113/sdufbed.20964
Chicago SARAÇ EŞSİZ ESRA,Özel Selma Ayşe Effects of Feature Extraction and Classification Methods on Cyberbully Detection. (2017): 190 - 200. 10.19113/sdufbed.20964
MLA SARAÇ EŞSİZ ESRA,Özel Selma Ayşe Effects of Feature Extraction and Classification Methods on Cyberbully Detection. , 2017, ss.190 - 200. 10.19113/sdufbed.20964
AMA SARAÇ EŞSİZ E,Özel S Effects of Feature Extraction and Classification Methods on Cyberbully Detection. . 2017; 190 - 200. 10.19113/sdufbed.20964
Vancouver SARAÇ EŞSİZ E,Özel S Effects of Feature Extraction and Classification Methods on Cyberbully Detection. . 2017; 190 - 200. 10.19113/sdufbed.20964
IEEE SARAÇ EŞSİZ E,Özel S "Effects of Feature Extraction and Classification Methods on Cyberbully Detection." , ss.190 - 200, 2017. 10.19113/sdufbed.20964
ISNAD SARAÇ EŞSİZ, ESRA - Özel, Selma Ayşe. "Effects of Feature Extraction and Classification Methods on Cyberbully Detection". (2017), 190-200. https://doi.org/10.19113/sdufbed.20964
APA SARAÇ EŞSİZ E, Özel S (2017). Effects of Feature Extraction and Classification Methods on Cyberbully Detection. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 21(1), 190 - 200. 10.19113/sdufbed.20964
Chicago SARAÇ EŞSİZ ESRA,Özel Selma Ayşe Effects of Feature Extraction and Classification Methods on Cyberbully Detection. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 21, no.1 (2017): 190 - 200. 10.19113/sdufbed.20964
MLA SARAÇ EŞSİZ ESRA,Özel Selma Ayşe Effects of Feature Extraction and Classification Methods on Cyberbully Detection. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol.21, no.1, 2017, ss.190 - 200. 10.19113/sdufbed.20964
AMA SARAÇ EŞSİZ E,Özel S Effects of Feature Extraction and Classification Methods on Cyberbully Detection. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi. 2017; 21(1): 190 - 200. 10.19113/sdufbed.20964
Vancouver SARAÇ EŞSİZ E,Özel S Effects of Feature Extraction and Classification Methods on Cyberbully Detection. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi. 2017; 21(1): 190 - 200. 10.19113/sdufbed.20964
IEEE SARAÇ EŞSİZ E,Özel S "Effects of Feature Extraction and Classification Methods on Cyberbully Detection." Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 21, ss.190 - 200, 2017. 10.19113/sdufbed.20964
ISNAD SARAÇ EŞSİZ, ESRA - Özel, Selma Ayşe. "Effects of Feature Extraction and Classification Methods on Cyberbully Detection". Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 21/1 (2017), 190-200. https://doi.org/10.19113/sdufbed.20964