Yıl: 2013 Cilt: 21 Sayı: 5 Sayfa Aralığı: 1411 - 1425 Metin Dili: İngilizce İndeks Tarihi: 29-07-2022

E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization

Öz:
This study presents a novel hybrid Turkish text summarization system that combines structural and semantic features. The system uses 5 structural features, 1 of which is newly proposed and 3 are semantic features whose values are extracted from Turkish Wikipedia links. The features are combined using the weights calculated by 2 novel approaches. The rst approach makes use of an analytical hierarchical process, which depends on a series of expert judgments based on pairwise comparisons of the features. The second approach makes use of the arti cial bee colony algorithm for automatically determining the weights of the features. To con rm the signi cance of the proposed hybrid system, its performance is evaluated on a new Turkish corpus that contains 110 documents and 3 human-generated extractive summary corpora. The experimental results show that exploiting all of the features by combining them results in a better performance than exploiting each feature individually.
Anahtar Kelime:

Konular: Mühendislik, Elektrik ve Elektronik
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] D.R. Radev, K. McKeown, Generating natural language summaries from multiple on-line sources", Computational Linguistics, Vol. 24, pp. 469-500, 1998.
  • [2] S.H. Sanda, F. Lacatusu, Generating single and multi-document summaries with gistexter", Document Understanding Conference, pp. 30-38, 2002.
  • [3] H. Saggion, G. Lapalme, Generating indicative-informative summaries with Su-muM", Computational Linguistics, Vol. 28, pp. 497-526, 2002.
  • [4] H. Jing, K.R. McKeown, Cut and paste based text summarization", Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 178-185, 2000.
  • [5] H.P. Luhn, The automatic creation of literature abstracts", IBM Journal of Research Development, Vol. 2 , pp. 159-165, 1958.
  • [6] H.P. Edmundson, New methods in automatic extracting", Journal of the Association for Computing Machinery, Vol. 16 , pp. 264-285, 1969.
  • [7] K. Wong, M. Wu, W. Li, Extractive summarization using supervised and semi-supervised learning", Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, pp. 985-992, 2008.
  • [8] Y. Gong, X. Liu, Generic text summarization using relevance measure and latent semantic analysis", Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19-25, 2001.
  • [9] J. Steinberger, Text summarization within the LSA framework", PhD Thesis, University of West Bohemia, Czech Republic, 2007.
  • [10] J.Y. Yeh, H.R. Ke, W.P. Yang, I.H. Meng, Text summarization using a trainable summarizer and latent semantic analysis", Journal of Information Processing and Management, Vol. 41, pp. 75-95, 2005.
  • [11] L. Hennig, Topic-based multi-document summarization with probabilistic latent semantic analysis", International Conference on Recent Advances in Natural Language Processing, pp. 144-149, 2009.
  • [12] J. Lee, S. Park, C. Ahn, D. Kim, Automatic generic document summarization based on non-negative matrix factorization", Information Processing and Management, Vol. 45, pp. 20-34, 2009.
  • [13] J. Kupiec, O.P. Jan, C. Francine, A trainable document summarizer", Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68-73, 1995.
  • [14] S.H. Teufel, M. Moens, Sentence extraction as a classi cation task", ACL/EACL Workshop on Intelligent Scalable Text Summarization, pp. 58-65, 1997.
  • [15] E. Filatova, V. Hatzivassiloglou, A formal model for information selection in multi-sentence text extraction", Proceedings of the 20th International Conference on Computational Linguistics, pp. 397-403, 2004.
  • [16] R. McDonald, A study of global inference algorithms in multi-document summarization", 29th European Conference on IR Research, pp. 557-564, 2007.
  • [17] R.M. Alguliev, R.M. Aliguliyev, M.S. Hajirahimove, C.A. Mehdiyev, MCMR: Maximum coverage and minimum redundant text summarization model", Expert Systems with Applications, Vol. 38, pp. 14514-14522, 2011.
  • [18] Z. Altan, A Turkish automatic text summarization system, IASTED International Conference on Arti cial Intelligence and Applications, 2004.
  • [19] E. Uzundere, E. Dedja, B. Diri, M.F. Amasyal, Automatic text summarization for Turkish texts", National Conference of the ASYU, 2008.
  • [20] C. Pembe, Automated query-biased and structure-preserving document summarization for web search tasks", PhD Thesis, Bogazici University, Turkey, 2011.
  • [21] C. Cgr, M. Kutlu, I. Cicekli, Generic text summarization for Turkish", The Computer Journal, Vol. 53, pp. 1315-1323, 2010.
  • [22] A. Guran, E. Bekar, S. Akyokus, A comparison of feature and semantic-based summarization algorithms for Turkish", International Symposium on Innovations in Intelligent Systems and Applications, 2010.
  • [23] M. Özsoy, _I. C icekli, F.N. Alpaslan, Text summarization of Turkish texts using latent semantic analysis", Proceedings of the 23rd International Conference on Computational Linguistics, pp. 869-876, 2010.
  • [24] A. Güran, N. Güler Bayazt, E. Bekar, Automatic summarization of Turkish documents using non-negative matrix factorization", International Symposium on Innovations in Intelligent Systems and Applications, pp. 480-484, 2011.
  • [25] T.L. Saaty, The Analytic Hierarchy Process, New York, McGraw-Hill, 1980.
  • [26] D. Karaboga, B. Basturk, A powerful and ecient algorithm for numerical function optimization: arti cial bee colony (ABC) algorithm", Journal of Global Optimization, Vol. 39, pp. 459-171, 2007.
  • [27] C.Y. Lin, E. Hovy, Automatic evaluation of summaries using N-gram co-occurrence statistics", Language Technology Conference, Vol. 1, pp. 71-78, 2003.
  • [28] Zemberek- Zemberek 2 is an open source NLP library for Turkic languages 2011-2012, available at: http://code.google.com/p/zemberek/downloads/list.
  • [29] M.F. Amasyal, A. Beken, Turkce kelimelerin anlamsal benzerliklerinin olculmesi ve metin sn andrmada kullan lmas", National Conference of SIU, 2009.
  • [30] E. Gabrilovich, S. Markovich, Computing semantic relatedness using Wikipedia-based explicit semantic analysis", 20th International Joint Conference on Arti cial Intelligence, pp. 1606-1611, 2007.
  • [31] E. Gabrilovich, S. Markovitch, Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge", 21st National Conference on Arti cial Intelligence, Vol. 2, pp. 1301-1306, 2006.
  • [32] K. Ramanathan, Y. Sankarasubramaniam, N. Mathur, A. Gupta, Document summarization using Wikipedia", 1st International Conference on Human-Computer Interaction, pp. 254-260, 2009.
  • [33] G. Williams, In search of representativity in specialised corpora: Categorisation through collocation", International Journal of Corpus Linguistics, Vol. 7, pp. 43-64, 2002.
  • [34] O. Ferret, Using collocations for topic segmentation and link detection", 19th International Conference on Computational Linguistics, Vol. 1, pp. 260-266, 2002.
  • [35] O. Sunercan, A. Birturk, Wikipedia missing link discovery: a comparative study", AAAI Spring Symposium on Linked Data Meets Arti?cial Intelligence, 2010. 1424
  • [36] C. Calli, Improving search result clustering by integrating semantic information from Wikipedia", MS Thesis, Middle East Technical University, Department of Computer Engineering, 2010.
  • [37] A. Boynuegri, Cross-lingual information retrieval on Turkish and English texts", MS Thesis, Middle East Technical University, Department of Computer Engineering, 2010.
  • [38] I.V. Mashechkin, M.I. Petrovskiy, D.S. Popov, D.V. Tsarev, Automatic text summarization using latent semantic analysis", Programming and Computer Software, Vol. 37, pp. 299-305, 2011.
  • [39] Standard Score from Wikipedia, the free encyclopedia 2001-2012, available at: http://en.wikipedia.org/wiki/Standard score.
  • [40] L. Felföldi, A. Kocsor, AHP-based classi er combination", Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems, pp. 45-58, 2004.
  • [41] Collaboration and Decision, Support Software for Groups and Organizations 2011-2012, available at: http://www.expertchoice.com/
  • [42] R. Srinivasa Rao, S.V.L. Narasimham, M. Ramalingaraju, Optimization of distribution network con guration for loss reduction using arti cial bee colony algorithm", International Journal of Electrical Power and Energy Systems Engineering, Vol. 1 , pp. 116-122, 2008.
  • [43] F. Kang, J. Li, Q. Xu, Structural inverse analysis by hybrid simplex arti cial bee colony algorithms", Computers and Structures, Vol. 87, pp. 861-870, 2009.
  • [44] S.N. Omkar, J. Senthilnath, Arti cial bee colony for classi cation of acoustic emission signal", International Journal of Aerospace Innovations, Vol. 1, pp. 129-143, 2009.
  • [45] D. Karaboga, C. Ozturk, A novel clustering approach: arti cial bee colony (ABC) algorithm", Applied Soft Computing, Vol. 11, pp. 652-657, 2011.
  • [46] D. Karaboga, C. Ozturk, Fuzzy clustering with arti cial bee colony algorithm, Scienti c Research and Essays", Vol. 5, pp. 1899-1902, 2010.
APA GÜRAN A, BAYAZIT GÜLER N, Gürbüz M (2013). E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. , 1411 - 1425.
Chicago GÜRAN AYSUN,BAYAZIT GÜLER Nilg un,Gürbüz Mustafa Zahid E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. (2013): 1411 - 1425.
MLA GÜRAN AYSUN,BAYAZIT GÜLER Nilg un,Gürbüz Mustafa Zahid E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. , 2013, ss.1411 - 1425.
AMA GÜRAN A,BAYAZIT GÜLER N,Gürbüz M E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. . 2013; 1411 - 1425.
Vancouver GÜRAN A,BAYAZIT GÜLER N,Gürbüz M E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. . 2013; 1411 - 1425.
IEEE GÜRAN A,BAYAZIT GÜLER N,Gürbüz M "E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization." , ss.1411 - 1425, 2013.
ISNAD GÜRAN, AYSUN vd. "E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization". (2013), 1411-1425.
APA GÜRAN A, BAYAZIT GÜLER N, Gürbüz M (2013). E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. Turkish Journal of Electrical Engineering and Computer Sciences, 21(5), 1411 - 1425.
Chicago GÜRAN AYSUN,BAYAZIT GÜLER Nilg un,Gürbüz Mustafa Zahid E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. Turkish Journal of Electrical Engineering and Computer Sciences 21, no.5 (2013): 1411 - 1425.
MLA GÜRAN AYSUN,BAYAZIT GÜLER Nilg un,Gürbüz Mustafa Zahid E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. Turkish Journal of Electrical Engineering and Computer Sciences, vol.21, no.5, 2013, ss.1411 - 1425.
AMA GÜRAN A,BAYAZIT GÜLER N,Gürbüz M E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. Turkish Journal of Electrical Engineering and Computer Sciences. 2013; 21(5): 1411 - 1425.
Vancouver GÜRAN A,BAYAZIT GÜLER N,Gürbüz M E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. Turkish Journal of Electrical Engineering and Computer Sciences. 2013; 21(5): 1411 - 1425.
IEEE GÜRAN A,BAYAZIT GÜLER N,Gürbüz M "E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization." Turkish Journal of Electrical Engineering and Computer Sciences, 21, ss.1411 - 1425, 2013.
ISNAD GÜRAN, AYSUN vd. "E fficient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization". Turkish Journal of Electrical Engineering and Computer Sciences 21/5 (2013), 1411-1425.