Yıl: 2023 Cilt: 15 Sayı: 2 Sayfa Aralığı: 334 - 354 Metin Dili: İngilizce DOI: 10.47000/tjmcs.1240729 İndeks Tarihi: 08-01-2024

Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts

Öz:
Relation graphs provide useful tools for structural and relational analyses of highly complex multi-component systems. Probabilistic relation graph models can represent relations between system components by their probabilistic links. These graph types have been widely used for the graphical representation of Markov models and bigram probabilities. This study presents an implication of relational similarities within probabilistic graph models of textual entries. The article discusses several utilization examples of two fundamental similarity measures in the probabilistic analysis of short texts. To this end, the construction of probabilistic graph models by using bigram probability matrices of textual entries is illustrated, and vector spaces of input word-vectors and output word-vectors are formed. In this vector space, the utilization of cosine similarity and mean squared error measures are demonstrated to evaluate the probabilistic relational similarity between lexeme pairs in short texts. By using probabilistic relation graphs of the short texts, relational interchangeability analyses of lexeme pairs are conducted, and confidence index parameters are defined to express the reliability of these analyses. Potential applications of these graphs in language processing and linguistics are discussed on the basis of the analysis results of example texts. The performance of the applied similarity measures is evaluated in comparison to the similarity index of the word2vec language model. Results of the comparative study in one of the illustrative examples reveal that synonyms with 0.18157 word2vec similarity value scored 1.0 cosine similarity value according to the proposed method.
Anahtar Kelime: Bigram probability relations probabilistic graph similarity text similarity relational interchangeability

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Alnahas, D., Alagoz, B.B., Probabilistic relational connectivity analysis of bigram models, In 2019 International Artificial Intelligence and Data Processing Symposium (IDAP) (Malatya, Turkey, 2019), 379–384.
  • Alnahas, D., Alagoz, B.B., A theoretical study on event spreading prediction by probabilistic connectivity analysis in dispersive networks, In 2019 International Artificial Intelligence and Data Processing Symposium (IDAP) (Malatya, Turkey, 2019), 590–595.
  • Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C., A neural probabilistic language model, Journal of Machine Learning Research, 3(2003), 1137–1155.
  • Conte, D., Foggia, P., Sansone, C., Vento, M., Thirty years of graph matching in pattern recognition, International Journal of Pattern Recognition and Artificial Intelligence, 18(2004), 265–298.
  • Dogus, B., Guzel, G., Development of matlab tool for text analysis, Capstone Project presented at Inonu University, Computer Engineering Department, (2018).
  • Erkan, G., Radev, D. R., Lexrank: Graph-based lexical centrality as salience in text summarization, Journal of Artificial Intelligence Research, 22(2004), 457–479.
  • Evert, S., Baroni, M., Lenci, A., Distributional semantic models, In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT): Tutorial Abstracts (Los Angeles, CA, USA, June 2010), Association for Computational Linguistics, 15–18.
  • Fallucchi, F., Zanzotto, F.M., Transitivity in semantic relation learning, In Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE-2010), (2010), IEEE, 1–8.
  • Friedman, N., Getoor, L., Koller, D., Pfeffer, A., Learning probabilistic relational models, In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (1999), IEEE, 1300–1309.
  • Ganesan, K., Gensim word2vec tutorial - full working example, 2018.
  • Ganesh, B.R., Gupta, D., Sasikala, T., Grammar error detection tool for medical transcription using stop words parts-of-speech tags ngram based model, In Proceedings of the Second International Conference on Computational Intelligence and Informatics (Singapore, 2018), Springer, 37–49.
  • Gardner, M., Mitchell, T., Efficient and expressive knowledge base completion using subgraph feature extraction, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (2015), 1488–1498.
  • Getoor, L., Friedman, N., Koller, D., Taskar, B., Learning probabilistic relational of relational structure, In Proceedings of the Eighteenth International Conference on Machine Learning, (2001), 170–177.
  • Hall, R.J., Murray, C.W., Verdonk, M.L., The fragment network: A chemistry recommendation engine built using a graph database, Journal of Medicinal Chemistry, 60(2017), 6440–6450.
  • Herrmannova, D., Knoth, P., Stahl, C., Patton, R., Wells, J., Text and graph based approach for analyzing patterns of research collaboration: An analysis of the trueimpactdataset, In 1st Workshop on Computational Impact Detection from Text Data (CIDTD) (Miyazaki, Japan, 2018).
  • Heymans, M., Singh, A.K., Deriving phylogenetic trees from the similarity analysis of metabolic pathways, Bioinformatics, 19(2003), i138– i146.
  • Higgins, D., Which statistics reflect semantics? rethinking synonymy and word similarity, Linguistic Evidence: Empirical, Theoretical and Computational Perspectives, (2005), 265–284.
  • Hofmann, T., Probabilistic latent semantic analysis, In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, (1999), Morgan Kaufmann Publishers Inc, 289–296.
  • Huang, A., Similarity measures for text document clustering, In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008), 4(Christchurch, New Zealand, 2008), 9–56.
  • Jurafsky, D., Martin, J.H., Speech and language processing: An introduction to natural language processing, Computational Linguistics, and Speech Recognition, (2008).
  • Konopik, M., Praˇz´ak, O., Steinberger, D., Brychc´ın, T., Uwb at semeval-2016 task 2: Interpretable semantic textual similarity with distributional semantics for chunks, In Proceedings of the 10th International Workshop on Semantic Evaluation, (2016), 803–808.
  • Lin, Y.-S., Jiang, J.-Y., Lee, S.-J., A similarity measure for text classification and clustering, IEEE Transactions on Knowledge and Data Engineering, 26(2013), 1575–1590.
  • Lopez-Gazpio, I., Maritxalar, M., Gonzalez-Agirre, A., Rigau, G., Uria, L. et al. Interpretable semantic textual similarity: Finding and explaining differences between sentences, Knowledge-Based Systems, 119(2017), 186–199.
  • Lorenzen, B., Murray, L., Bilingual aphasia: A theoretical and clinical review, American Journal of Speech-language Pathology, (2008).
  • Mall, R., Cerulo, L., Bensmail, H., Iavarone, A., Ceccarelli, M., Detection of statistically significant network changes in complex biological networks, BMC Systems Biology, 11(2017), 32.
  • Manning, C.D., Schütze, H., Foundations of Statistical Natural Language Processing, MIT press, 1999.
  • Meladianos, P., Nikolentzos, G., Rousseau, F., Stavrakas, Y., Vazirgiannis, M., Degeneracy-based real-time sub-event detection in twitter stream, In Ninth International AAAI Conference on Web and Social Media, (2015), 248–257.
  • Meladianos, P., Xypolopoulos, C., Nikolentzos, G., Vazirgiannis, M., An optimization approach for sub-event detection and summarization in twitter, In European Conference on Information Retrieval, (Cham, 2018), Springer, 481–493.
  • Mikolov, T., Chen, K., Corrado, G., Dean, J., Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781, (2013).
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J., Distributed representations of words and phrases and their compositionality, In Advances in Neural Information Processing Systems, (2013), 3111–3119.
  • Mikolov, T., tau Yih, W., Zweig, G., Linguistic regularities in continuous space word representations, In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2013), 746–751.
  • Mnih, A., Hinton, G.E., Three new graphical models for statistical language modelling, In Proceedings of the 24th International Conference on Machine Learning, (2007), 641–648.
  • Nabhan, A.R., Shaalan, K., A graph-based approach to text genre analysis, Computaci´on y Sistemas, 20(2016), 527–539.
  • Nikolentzos, G., Meladianos, P., Rousseau, F., Stavrakas, Y., Vazirgiannis, M., Shortest path graph kernels for document similarity, In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (2017), 1890–1900.
  • Niwattanakul, S., Singthongchai, J., Naenudorn, E., Wanapu, S., Using of jaccard coefficient for keywords similarity, In Proceedings of the International Multiconference of Engineers and Computer Scientists, 1(2013), 380–384.
  • Ozdikis, O., Senkul, P., Oguztuzun, H., Semantic expansion of tweet contents for enhanced event detection in twitter, In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, (2012), IEEE, 20–24.
  • Pennington, J., Socher, R., Manning, C.D., Glove: Global vectors for word representation, In Proceedings of the 2014 Conference on Empirical Methods in Batural Language Processing, (EMNLP), (2014), 1532–1543.
  • Raymond, J.W.,Willett, P., Maximum common subgraph isomorphism algorithms for the matching of chemical structures, Journal of Computeraided Molecular Design, 16(2002), 521–533.
  • Resnik, P., Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language, Journal of Artificial Intelligence Research, 11(1999), 95–130.
  • Rong, H., Ma, T., Tang, M., Cao, J., A novel subgraph k+ -isomorphism method in social network based on graph similarity detection, Soft Computing, 22(2018), 2583–2601.
  • Rooth, M., Riezler, S., Prescher, D., Carroll, G., Beil, F., Inducing a semantically annotated lexicon via em-based clustering, In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, (1999), Association for Computational Linguistics, 104–111.
  • Rosen, K.H., Discrete Mathematics and Its Applications, McGraw-Hill, 2007.
  • Rosenthal, G., Vasa, F., Griffa, A., Hagmann, P., Amico, E. et al., Mapping higher-order relations between brain structure and function with embedded vector representations of connectomes, Nature Communications, 9(2018), 2178.
  • Rousseau, F., Vazirgiannis, M., Main core retention on graph-of-words for single-document keyword extraction, In European Conference on Information Retrieval (Cham, 2015), Springer, 382–393.
  • Sahlgren, M., Vector-based semantic analysis: Representing word meanings based on random labels, In ESSLIWorkshop on Semantic Knowledge Acquisition and Categorization, (2001).
  • Shibuya, Y., Jensen, K.E., Mining for constructions in texts using n-gram and network analysis, Globe: A Journal of Language, Culture and Communication, (2015).
  • Skianis, K., Malliaros, F., Vazirgiannis, M., Fusing document, collection and label graph-based representations with word embeddings for text classification, In NAACL-HLT Workshop on Graph-Based Natural Language Processing (TextGraphs) (New Orleans, Louisiana, United States, 2018), 382–393.
  • Vazirgiannis, M., Malliaros, F.D., Nikolentzos, G., Graphrep: Boosting text mining, nlp and information retrieval with graphs, In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (2018), 2295–2296.
  • Wang, Y.-Y., Mahajan, M., Huang, X., A unified context-free grammar and n-gram model for spoken language processing, In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings Cat. No. 00CH37100, 3(New Orleans, Louisiana, United States, 2000), IEEE, 1639–1642.
  • Watts, D.J., Small Worlds: The Dynamics of Networks Between Order and Randomness, vol. 9. Princeton University Press, 2004.
APA Alnahas D, ATEŞ A, Aydin A, Alagoz B (2023). Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. , 334 - 354. 10.47000/tjmcs.1240729
Chicago Alnahas Dima,ATEŞ Abdullah,Aydin Ahmet Arif,Alagoz Baris Baykant Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. (2023): 334 - 354. 10.47000/tjmcs.1240729
MLA Alnahas Dima,ATEŞ Abdullah,Aydin Ahmet Arif,Alagoz Baris Baykant Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. , 2023, ss.334 - 354. 10.47000/tjmcs.1240729
AMA Alnahas D,ATEŞ A,Aydin A,Alagoz B Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. . 2023; 334 - 354. 10.47000/tjmcs.1240729
Vancouver Alnahas D,ATEŞ A,Aydin A,Alagoz B Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. . 2023; 334 - 354. 10.47000/tjmcs.1240729
IEEE Alnahas D,ATEŞ A,Aydin A,Alagoz B "Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts." , ss.334 - 354, 2023. 10.47000/tjmcs.1240729
ISNAD Alnahas, Dima vd. "Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts". (2023), 334-354. https://doi.org/10.47000/tjmcs.1240729
APA Alnahas D, ATEŞ A, Aydin A, Alagoz B (2023). Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. Turkish Journal of Mathematics and Computer Science, 15(2), 334 - 354. 10.47000/tjmcs.1240729
Chicago Alnahas Dima,ATEŞ Abdullah,Aydin Ahmet Arif,Alagoz Baris Baykant Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. Turkish Journal of Mathematics and Computer Science 15, no.2 (2023): 334 - 354. 10.47000/tjmcs.1240729
MLA Alnahas Dima,ATEŞ Abdullah,Aydin Ahmet Arif,Alagoz Baris Baykant Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. Turkish Journal of Mathematics and Computer Science, vol.15, no.2, 2023, ss.334 - 354. 10.47000/tjmcs.1240729
AMA Alnahas D,ATEŞ A,Aydin A,Alagoz B Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. Turkish Journal of Mathematics and Computer Science. 2023; 15(2): 334 - 354. 10.47000/tjmcs.1240729
Vancouver Alnahas D,ATEŞ A,Aydin A,Alagoz B Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts. Turkish Journal of Mathematics and Computer Science. 2023; 15(2): 334 - 354. 10.47000/tjmcs.1240729
IEEE Alnahas D,ATEŞ A,Aydin A,Alagoz B "Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts." Turkish Journal of Mathematics and Computer Science, 15, ss.334 - 354, 2023. 10.47000/tjmcs.1240729
ISNAD Alnahas, Dima vd. "Revisiting Probabilistic Relation Analysis: Using Probabilistic Relation Graphs for Relational Similarity Analysis of Words in Short Texts". Turkish Journal of Mathematics and Computer Science 15/2 (2023), 334-354. https://doi.org/10.47000/tjmcs.1240729