Yıl: 2022 Cilt: 30 Sayı: 5 Sayfa Aralığı: 1821 - 1838 Metin Dili: İngilizce DOI: 10.55730/1300-0632.3907 İndeks Tarihi: 08-12-2022

Automatic keyword assignment system for medical research articles using nearest-neighbor searches

Öz:
Assigning accurate keywords to research articles is increasingly important concern. Keywords should be selected meticulously to describe the article well since keywords play an important role in matching readers with research articles in order to reach a bigger audience. So, improper selection of keywords may result in less attraction to readers which results in degradation in its audience. Hence, we designed and developed an automatic keyword assignment system (AKAS) for research articles based on k-nearest neighbor (k-NN) and threshold-nearest neighbor (t-NN) accompanied with information retrieval systems (IRS), which is a corpus-based method by utilizing IRS using the Medline dataset in PubMed. First, AKAS accepts an abstract of the research article or a particular text as a query to the IRS. Next, the IRS returns a ranked list of articles to the given query. Then, we selected a set of documents from this list using two different methods, which are k-NN and t-NN representing the first k documents and documents whose similarity is greater than the threshold value of t, respectively. To evaluate our proposed system, we conducted a set of experiments on a selected subset of 458,594 PubMed articles. Then, we performed an experiment to observe the performance of AKAS results by comparing with the original keywords assigned by authors. The results we obtained showed that our system suggests keywords more than 55% match in terms of F-score. We presented both methods we used and results of experiments, in detail.
Anahtar Kelime: Automatic keyword assignment information retrieval k-nearest neighbors t-nearest neighbors PubMed

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
0
0
0
  • [1] Abilhoa WD, De Castro LN. A keyword extraction method from twitter messages represented as graphs. Applied Mathematics and Computation. 2014; 240: 308-325.
  • [2] Zhang K, Xu H, Tang J, Li J. Keyword extraction using support vector machine. In international conference on web-age information management; Berlin, Heidelberg; 2006; 85-96.
  • [3] Sarkar K. Automatic keyphrase extraction from medical documents. In International Conference on Pattern Recog- nition and Machine Intelligence; Berlin, Heidelberg; 2009; 273-278.
  • [4] Huang Z, Xie Z. A patent keywords extraction method using TextRank model with prior public knowledge. Complex & Intelligent Systems. 2022; 8 (1): 1-2.
  • [5] Papagiannopoulou E, Tsoumakas G, Papadopoulos A. Keyword Extraction Using Unsupervised Learning on the Document’s Adjacency Matrix. InProceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15) 2021; 94-105.
  • [6] Miah M, Sulaiman J, Sarwar TB, Zamli KZ, Jose R. Study of keyword extraction techniques for electric double-layer capacitor domain using text similarity indexes: An experimental analysis. Complexity. 2021.
  • [7] Liu Z, Huang W, Zheng Y, Sun M. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 conference on empirical methods in natural language processing; 2010; 366-376.
  • [8] Ahmed N, Dilmaç F, Alpkocak A. Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method. In Healthcare 2020; 8 (4): 392.
  • [9] Bi W, Kwok J. Efficient multi-label classification with many labels. In International conference on machine learning. PMLR; 2013; 405-413.
  • [10] Zhang C. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems 2008; 4 (3): 1169-1180.
  • [11] Sebastiani F. Machine learning in automated text categorization. ACM computing surveys (CSUR) 2002; 34 (1): 1-47.
  • [12] Wang Z, Joshi S, Savel’ev S, Song W, Midya R et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nature Electronics 2018; 1 (2): 137-45.
  • [13] Van Engelen JE, Hoos HH. A survey on semi-supervised learning. Machine Learning 2020; 109 (2): 373-440.
  • [14] Matsuo Y, Ishizuka M. Keyword extraction from a document using word co-occurrence statistical information. Transactions of the Japanese Society for Artificial Intelligence 2002; 17 (3):217-23.
  • [15] Ruch P, Geissbühler A, Gobeill J, Lisacek F, Tbahriti I et al. Using discourse analysis to improve text categorization in MEDLINE. Studies in health technology and informatics 2007; 129 (1):710.
  • [16] Pay T, Lucci S. Automatic keyword extraction: An ensemble method. In2017 IEEE international conference on big data (big data); 2017. pp. 4816-4818.
  • [17] Singhal A, Sharma DK. Keyword extraction using Renyi entropy: a statistical and domain independent method. In2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS);IEEE 2021; 1970-1975.
  • [18] Onan A, Korukoğlu S, Bulut H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications 2016; 57: 232-47.
  • [19] Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing 2003; 216-223.
  • [20] Rose S, Engel D, Cramer N, Cowley W. Automatic keyword extraction from individual documents. Text mining: applications and theory 2010; 1: 1-20.
  • [21] Beliga S. Keyword extraction: a review of methods and approaches. University of Rijeka, Department of Informatics, Rijeka. 2014;1 (9).
  • [22] Gokalp O, Tasci E, Ugur A. A novel wrapper feature selection algorithm based on iterated greedy metaheuristic for sentiment classification. Expert Systems with Applications. 2020; 146: 113176.
  • [23] Rak R, Kurgan LA, Reformat M. Multilabel associative classification categorization of MEDLINE articles into MeSH keywords. IEEE engineering in medicine and biology magazine. 2007; 26 (2): 47.
  • [24] HaCohen-Kerner Y, Gross Z, Masa A. Automatic extraction and learning of keyphrases from scientific articles. In International Conference on Intelligent Text Processing and Computational Linguistics. Berlin, Heidelberg 2005; 657-669.
  • [25] Turney PD. Learning algorithms for keyphrase extraction. Information retrieval 2000; 2 (4): 303-36.
  • [26] Vega-Oliveros DA, Gomes PS, Milios EE, Berton L. A multi-centrality index for graph-based keyword extraction. Information Processing & Management. 2019; 56 (6): 102063.
  • [27] Jones KS. Information retrieval and artificial intelligence. Artificial Intelligence. 1999; 114 (1-2): 257-81.
  • [28] Xiong A, Liu D, Tian H, Liu Z, Yu P et al. News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Science and Technology. 2021; 26 (6): 886-93.
  • [29] Litvak M, Last M. Graph-based keyword extraction for single-document summarization. In Coling 2008: Proceedings of the workshop multi-source multilingual information extraction and summarization 2008. pp. 17-24.
  • [30] Faraj A, Rashid B, Shareef T. Comparative study of relational and non-relations database performances using Oracle and MongoDB systems. International Journal of Computer Engineering and Technology (IJCET) 2014; 5 (11): 11-22.
  • [31] Jung MG, Youn SA, Bae J, Choi YL. A study on data input and output performance comparison of mongodb and postgresql in the big data environment. In 2015 8th international conference on database theory and application (DTA). IEEE; 2015. pp. 14-17.
  • [32] Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the association for computational linguistics 2017; 5: 135-46.
  • [33] Řehůřek R, Sojka P. Gensim—statistical semantics in python. Retrieved from genism. org. 2011.
  • [34] Neumann M, King D, Beltagy I, Ammar W. ScispaCy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669. 2019.
  • [35] Kononenko O, Baysal O, Holmes R, Godfrey MW. Mining modern repositories with elasticsearch. In Proceedings of the 11 th working conference on mining software repositories 2014. pp. 328-331.
  • [36] Lokhande PS, Aslam F, Hawa N, Munir J, Gulamgaus M. Efficient way of web development using python and flask.International Journal of Advanced Research in Computer Science 2015; 6 (2):54–57.
  • [37] Vyshnavi VR, Malik A. Efficient Way of Web Development Using Python and Flask. Int. J. Recent Res. Asp 2019;6 (2):16-9.
  • [38] Tarekegn A, Ricceri F, Costa G, Ferracin E, Giacobini M. Predictive modeling for frailty conditions in elderly people: machine learning approaches. JMIR medical informatics. 2020; 8 (6): e16678.
  • [39] Tarekegn AN, Giacobini M, Michalak K. A review of methods for imbalanced multi-label classification. Pattern Recognition. 2021; 118: 107965.
  • [40] Babbar R, Schölkopf B. Data scarcity, robustness and extreme multi-label classification. Machine Learning. 2019; 108(8): 1329-51.
  • [41] Hu M, Han H, Shan S, Chen X. Multi-label learning from noisy labels with non-linear feature transformation. InAsian Conference on Computer Vision. Springer, Cham; 2018. pp. 404-419
  • [42] Peng YL, Lee WP. Data selection to avoid overfitting for foreign exchange intraday trading with machine learning. Applied Soft Computing. 2021; 108: 107461.
  • [43] Bejani MM, Ghatee M. A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review. 2021; 54(8): 6391-438.
  • [44] Chen CC, Watabe M, Shiba K, Sogabe M, Sakamoto K, Sogabe T. On the expressibility and overfitting of quantum circuit learning. ACM Transactions on Quantum Computing. 2021 Jul 9;2(2):1-24.
APA DİLMAÇ F, Alpkoçak A (2022). Automatic keyword assignment system for medical research articles using nearest-neighbor searches. , 1821 - 1838. 10.55730/1300-0632.3907
Chicago DİLMAÇ Fatih,Alpkoçak Adil Automatic keyword assignment system for medical research articles using nearest-neighbor searches. (2022): 1821 - 1838. 10.55730/1300-0632.3907
MLA DİLMAÇ Fatih,Alpkoçak Adil Automatic keyword assignment system for medical research articles using nearest-neighbor searches. , 2022, ss.1821 - 1838. 10.55730/1300-0632.3907
AMA DİLMAÇ F,Alpkoçak A Automatic keyword assignment system for medical research articles using nearest-neighbor searches. . 2022; 1821 - 1838. 10.55730/1300-0632.3907
Vancouver DİLMAÇ F,Alpkoçak A Automatic keyword assignment system for medical research articles using nearest-neighbor searches. . 2022; 1821 - 1838. 10.55730/1300-0632.3907
IEEE DİLMAÇ F,Alpkoçak A "Automatic keyword assignment system for medical research articles using nearest-neighbor searches." , ss.1821 - 1838, 2022. 10.55730/1300-0632.3907
ISNAD DİLMAÇ, Fatih - Alpkoçak, Adil. "Automatic keyword assignment system for medical research articles using nearest-neighbor searches". (2022), 1821-1838. https://doi.org/10.55730/1300-0632.3907
APA DİLMAÇ F, Alpkoçak A (2022). Automatic keyword assignment system for medical research articles using nearest-neighbor searches. Turkish Journal of Electrical Engineering and Computer Sciences, 30(5), 1821 - 1838. 10.55730/1300-0632.3907
Chicago DİLMAÇ Fatih,Alpkoçak Adil Automatic keyword assignment system for medical research articles using nearest-neighbor searches. Turkish Journal of Electrical Engineering and Computer Sciences 30, no.5 (2022): 1821 - 1838. 10.55730/1300-0632.3907
MLA DİLMAÇ Fatih,Alpkoçak Adil Automatic keyword assignment system for medical research articles using nearest-neighbor searches. Turkish Journal of Electrical Engineering and Computer Sciences, vol.30, no.5, 2022, ss.1821 - 1838. 10.55730/1300-0632.3907
AMA DİLMAÇ F,Alpkoçak A Automatic keyword assignment system for medical research articles using nearest-neighbor searches. Turkish Journal of Electrical Engineering and Computer Sciences. 2022; 30(5): 1821 - 1838. 10.55730/1300-0632.3907
Vancouver DİLMAÇ F,Alpkoçak A Automatic keyword assignment system for medical research articles using nearest-neighbor searches. Turkish Journal of Electrical Engineering and Computer Sciences. 2022; 30(5): 1821 - 1838. 10.55730/1300-0632.3907
IEEE DİLMAÇ F,Alpkoçak A "Automatic keyword assignment system for medical research articles using nearest-neighbor searches." Turkish Journal of Electrical Engineering and Computer Sciences, 30, ss.1821 - 1838, 2022. 10.55730/1300-0632.3907
ISNAD DİLMAÇ, Fatih - Alpkoçak, Adil. "Automatic keyword assignment system for medical research articles using nearest-neighbor searches". Turkish Journal of Electrical Engineering and Computer Sciences 30/5 (2022), 1821-1838. https://doi.org/10.55730/1300-0632.3907