Yıl: 2021 Cilt: 10 Sayı: 4 Sayfa Aralığı: 1349 - 1365 Metin Dili: Türkçe DOI: 10.17798/bitlisfen.949052 İndeks Tarihi: 29-07-2022

CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality

Öz:
In this paper, we introduce CatSumm (Cengiz, Ali, Taner Summarization), a novel method for multi-document document summarisation. The suggested method forms a summarization according to three main steps: Representation of input texts, the main stages of the CatSumm model, and sentence scoring. A Text Processing software, is introduced and used to protect the semantic loyalty between word groups at stage of representation of input texts. Spectral Sentence Clustering (SSC), one of the main stages of the CatSumm model, is the summarization process obtained from the proportional values of the sub graphs obtained after spectral graph segmentation. Obtaining super edges is another of the main stages of the method, with the assumption that sentences with weak values below a threshold value calculated by the standard deviation (SD) cannot be included in the summary. Using the different node centrality methods of the CatSumm approach, it forms the sentence rating phase of the recommended summarising approach, determining the significant nodes and hence significant nodes. Finally, the result of the CatSumm method for the purpose of text summarisation within the in the research was measured ROUGE metrics on the Document Understanding Conference (DUC-2004, DUC-2002) datasets. The presented model produced 44.073%, 53.657%, and 56.513% summary success scores for abstracts of 100, 200 and 400 words, respectively.
Anahtar Kelime: Spektral bölmeleme Özetleme Kenar azaltma Belge Özetleme Çıkarıcı özetleme Çizge tabanlı özetleme

CatSumm: Spektral Çizge Bölmeleme ve Düğüm Merkeziliklerine Dayalı Çıkarıcı Metin Özetleme

Öz:
Bu çalışmada, çok belgeli metin özetleme için yeni bir yöntemi CatSumm (Cengiz, Ali, Taner Özetleme) tanıtılmaktadır. Önerilen yöntem, üç ana adıma göre bir özet oluşturmaktadır: Giriş metinlerinin temsili, CatSumm modelinin ana aşamaları ve cümle puanlama. Girilen metinlerin gösterimi aşamasında kelime grupları arasındaki anlamsal bağlılığı korumak için bir Metin İşleme yazılımı tanıtılmış ve kullanılmıştır. CatSumm modelinin ana aşamalarından biri olan Spektral Cümle Kümeleme (SCK), spektral çizge bölmeleme sonrasında elde edilen alt çizgelerin oransal değerlerinden elde edilen özetleme işlemidir. Standart sapma ile hesaplanan bir eşik değerinin altında kalan cümlelerin özete dahil edilemeyeceği varsayımıyla, yöntemin ana aşamalarından bir diğeri de süper kenarların elde edilmesidir. Son olarak, araştırma kapsamında metin özetleme amacıyla CatSumm yönteminin sonucu, Belge Anlama Konferansı (DUC-2004, DUC-2002) veri setleri üzerinde ROUGE metrikleri ile ölçülmüştür. Sunulan model 100, 200 ve 400 kelimelik özetler için sırasıyla %44.073, %53.657, %56.513 özet başarı puanı üretmektedir.
Anahtar Kelime:

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] Durmaz O., 2011. Metin sınıflandırmada boyut azaltmanın etkisi ve özellik seçimi. 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU 2011) doi:10.1360/zd-2013-43-6-1064.
  • [2] Hark C., Uçkan T., Seyyarer E., Karci A. 2018. Metin Özetleme İçin Çizge Tabanlı Bir Öneri. IDAP 2018 - International Artificial Intelligence and Data Processing Symposium, 1–6.
  • [3] Canberk G. , Sağıroğlu Ş. 2006. Bilgi ve Bilgisayar Güvenliği : Casus Yazılımlar ve Korunma Yöntemleri (Grafiker Yayıncılık, Ankara).
  • [4] Uçkan T., Hark C., Seyyarer E., Karci A. 2019. Ağırlıklandırılmış çizgelerde Tf-Idf ve eigen ayrışımı kullanarak metin sınıflandırma. Bitlis Eren Üniversitesi Fen Bilim Derg. doi:10.17798/bitlisfen.531221.
  • [5] Hark C., Uckan T., Seyyarer E., Karci A. 2019. Extractive Text Summarization via Graph Entropy Çizge Entropi ile Çikarici Metin Özetleme. 2019 International Conference on Artificial Intelligence and Data Processing Symposium, IDAP 2019 doi:10.1109/IDAP.2019.8875936.
  • [6] Hark C., Seyyarer A., Uçkan T., Karci A. 2017. Doǧal Dil İşleme Yaklaşimlari ile Yapisal Olmayan Dökümanlarin Benzerliǧi. IDAP 2017 - International Artificial Intelligence and Data Processing Symposium, 1–6.
  • [7] Radev DR., Hovy E., McKeown K. 2002. Introduction to the special issue on summarization. Comput Linguist, 28 (4): 399–408.
  • [8] Erkan G., Radev DR. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. J Artif Intell Res, 22: 457–479.
  • [9] Das D., Martins AFT. 2007. A survey on automatic text summarization. Lit Surv Lang Stat II course C, 4 (192–195): 57.
  • [10] Kaynar O., Görmez Y., Işık YE., Demirkoparan F. 2017. Comparison of Graph Based Document Summarization Method. 2017 International Conference on Computer Science and Engineering (UBMK), 598–603.
  • [11] Kutlu M., Cigir C., Cicekli I. 2010. Generic text summarization for Turkish. Comput J, 53 (8): 1315–1323.
  • [12] Alguliev RM., Aliguliyev RM., Hajirahimova MS. 2012. GenDocSum+ MCLR: Generic document summarization based on maximum coverage and less redundancy. Expert Syst Appl 39 (16): 12460–12473.
  • [13] Dalal V., Malik L. 2013. A Survey of Extractive and Abstractive Text Summarization Techniques. 2013 6th International Conference on Emerging Trends in Engineering and Technology (IEEE), 109–110.
  • [14] Hark C., Uçkan T., Seyyarer E., Karci A. 2019. Metin özetlemesi için düğüm merkezliklerine dayalı denetimsiz bir yaklaşım. Bitlis Eren Üniversitesi Fen Bilim Derg., doi:10.17798/bitlisfen.568883.
  • [15] Mihalcea R., Tarau P. 2005. A Language Independent Algorithm for Single and Multiple Document Summarization. Proceedings of IJCNLP 2005, 2nd International Join Conference on Natural Language Processing, 19–24.
  • [16] Sarkar K., Saraf K., Ghosh A. 2015. Improving Graph Based Multidocument Text Summarization Using an Enhanced Sentence Similarity Measure. 2015 IEEE 2nd International Conference on Recent Trends in Information Systems, ReTIS 2015 - Proceedings, 359–365.
  • [17] Joshi A., Fidalgo E., Alegre E., Fernández-Robles L. 2019. SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders. Expert Syst Appl 129: 200–215.
  • [18] Mihalcea R., Tarau P. 2004. TextRank: Bringing Order into Texts. Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions - (Association for Computational Linguistics, Morristown, NJ, USA), 20.
  • [19] Parveen D., Ramsl H-M., Strube M. 2015. Topical Coherence for Graph-Based Extractive Summarization. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1949–1954.
  • [20] Hark C., Karci A. 2020. Karcı summarization: A simple and effective approach for automatic text summarization using Karcı entropy. Inf Process Manag, 57 (3): 102187.
  • [21] Uçkan T., Karci A. 2019. Extractive multi-document text summarization based on graph independent sets. (xxxx). doi:10.1016/j.eij.2019.12.002.
  • [22] Luhn HP. 1958. The Automatic Creation of Literature Abstracts. IBM J Res Dev, 2 (2): 159– 165.
  • [23] Edmundson HP. 1969. New methods in automatic extracting. J ACM, 16 (2): 264–285.
  • [24] Mallick C., Das AK., Dutta M., Das AK., Sarkar A. 2019. Graph-Based Text Summarization Using Modified TextRank. Soft Computing in Data Analytics (Springer), 137–146.
  • [25] Pouriyeh S., et al. Graph-based Ontology Summarization: A Survey.
  • [26] Allahyari M., et al. 2017. Text summarization techniques: A brief survey. doi:10.1145/nnnnnnn.nnnnnnn.
  • [27] Nasr Azadani M., Ghadiri N., Davoodijam E. 2018. Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. J Biomed Inform, 84: 42– 58.
  • [28] D’hondt J., Verhaegen P-A., Vertommen J., Cattrysse D., Duflou JR. 2011. Topic identification based on document coherence and spectral analysis. doi:10.1016/j.ins.2011.04.044.
  • [29] Uçkan T., Hark C., Karci A. 2020. SSC: Clustering of Turkish texts by spectral graph partitioning. J Polytech, doi:10.2339/politeknik.684558.
  • [30] Karci A. 1998. Çizge Algoritmaları ve Çizge Bölmeleme. Dissertation (Fırat Universitesi).
  • [31] Von Luxburg U. 2007. A Tutorial on Spectral Clustering.
  • [32] Slininger B. Fiedler’s Theory of Spectral Graph Partitioning.
  • [33] Robert N.. Statistics: Definition of Standard Deviation.
  • [34] Bavelas A. 1948. A mathematical model for group structures. Hum Organ, 7 (3): 16–30.
  • [35] Fattah MA., Ren F. 2009. GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Comput Speech Lang, 23 (1): 126–144.
  • [36] Boudin F., et al. 2013. A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction To cite this version : HAL Id : hal-00850187 A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction.
  • [37] Kosorukoff A. 2011. Social Network Analysis Theory and Applications (Passmore, D. L, 2011).
  • [38] Garey MR., Johnson DS. 1979. Computers and Intractability : A Guide to the Theory of NPCompleteness (W.H. Freeman).
  • [39] McPherson M., Smith-Lovin L., Cook JM. 2001. Birds of a feather: homophily in social networks. Annu Rev Sociol, 2 7(1): 415–444.
  • [40] Analysis BN. 2016. Centrality and Hubs. (1979). doi:10.1016/B978-0-12-407908-3.00005-4.
  • [41] NIST. Document Understanding Conferences. NIST.
  • [42] Lin CY. 2004. Rouge: A Package for Automatic Evaluation of Summaries. Proc Work text Summ branches out, (WAS 2004): 25–26.
  • [43] Lin C-Y., Hovy E. Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics.
  • [44] Xiong S., Ji D. 2016. Query-focused multi-document summarization using hypergraph-based ranking. Inf Process Manag, 52 (4): 670–681.
  • [45] Republic C. 2009. Evaluation Measures for Text Summarization Josef Steinberger, Karel Je zek. 28: 1001–1025.
  • [46] Mihalcea R. 2005. Language Independent Extractive Summarization. Proc ACL 2005 Interact poster Demonstr Sess, - ACL ’05 (June): 49–52.
  • [47] Mihalcea R., Tarau P. 1800. TextRank: Bringing Order into Texts.
  • [48] Vanderwende L., Suzuki H., Brockett C., Nenkova A. 2007. Beyond SumBasic: task-focused summarization with sentence simplification and lexical expansion. Inf Process Manag, 43 (6): 1606–1618.
  • [49] Haghighi A., Vanderwende L. 2009. Exploring Content Models for Multi-Document Summarization, (June): 362.
APA uçkan T, hark c, KARCI A (2021). CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. , 1349 - 1365. 10.17798/bitlisfen.949052
Chicago uçkan Taner,hark cengiz,KARCI Ali CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. (2021): 1349 - 1365. 10.17798/bitlisfen.949052
MLA uçkan Taner,hark cengiz,KARCI Ali CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. , 2021, ss.1349 - 1365. 10.17798/bitlisfen.949052
AMA uçkan T,hark c,KARCI A CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. . 2021; 1349 - 1365. 10.17798/bitlisfen.949052
Vancouver uçkan T,hark c,KARCI A CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. . 2021; 1349 - 1365. 10.17798/bitlisfen.949052
IEEE uçkan T,hark c,KARCI A "CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality." , ss.1349 - 1365, 2021. 10.17798/bitlisfen.949052
ISNAD uçkan, Taner vd. "CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality". (2021), 1349-1365. https://doi.org/10.17798/bitlisfen.949052
APA uçkan T, hark c, KARCI A (2021). CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(4), 1349 - 1365. 10.17798/bitlisfen.949052
Chicago uçkan Taner,hark cengiz,KARCI Ali CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 10, no.4 (2021): 1349 - 1365. 10.17798/bitlisfen.949052
MLA uçkan Taner,hark cengiz,KARCI Ali CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol.10, no.4, 2021, ss.1349 - 1365. 10.17798/bitlisfen.949052
AMA uçkan T,hark c,KARCI A CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2021; 10(4): 1349 - 1365. 10.17798/bitlisfen.949052
Vancouver uçkan T,hark c,KARCI A CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2021; 10(4): 1349 - 1365. 10.17798/bitlisfen.949052
IEEE uçkan T,hark c,KARCI A "CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality." Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10, ss.1349 - 1365, 2021. 10.17798/bitlisfen.949052
ISNAD uçkan, Taner vd. "CatSumm: Extractive Text Summarization based on Spectral Graph Partitioning and Node Centrality". Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 10/4 (2021), 1349-1365. https://doi.org/10.17798/bitlisfen.949052