Yıl: 2018 Cilt: 26 Sayı: 3 Sayfa Aralığı: 1662 - 1672 Metin Dili: İngilizce DOI: 10.3906/elk-1706-81 İndeks Tarihi: 25-10-2018

Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing

Öz:
Released only a year ago as the outputs of a research project (“Parsing Web 2.0 Sentences”, supported in part by a TUB¨ ˙ITAK 1001 grant (No. 112E276) and a part of the ICT COST Action PARSEME (IC1207)), IMST and IWT are currently the most comprehensive Turkish dependency treebanks in the literature. This article introduces the final states of our treebanks, as well as a newly integrated hierarchical categorization of the multiheaded dependencies and their organization in an exclusive deep dependency layer in the treebanks. It also presents the adaptation of recent studies on standardizing multiword expression and named entity annotation schemes for the Turkish language and integration of benchmark annotations into the dependency layers of our treebanks and the mapping of the treebanks to the latest Universal Dependencies (v2.0) standard, ensuring further compliance with rising universal annotation trends. In addition to significantly boosting the universal recognition of Turkish treebanks, our recent efforts have shown an improvement in their syntactic parsing performance (up to 77.8%/82.8% LAS and 84.0%/87.9% UAS for IMST/IWT, respectively). The final states of the treebanks are expected to be more suited to different natural language processing tasks, such as named entity recognition, multiword expression detection, transfer-based machine translation, semantic parsing, and semantic role labeling.
Anahtar Kelime:

Konular: Mühendislik, Elektrik ve Elektronik Bilgisayar Bilimleri, Yazılım Mühendisliği Bilgisayar Bilimleri, Sibernitik Bilgisayar Bilimleri, Bilgi Sistemleri Bilgisayar Bilimleri, Donanım ve Mimari Bilgisayar Bilimleri, Teori ve Metotlar Bilgisayar Bilimleri, Yapay Zeka
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E. MaltParser: A language-independent system for data-driven dependency parsing. Nat Lang Eng 2007; 13: 95-135.
  • Adalı K, Din¸c T, Gökırmak M, Eryiğit G. Comprehensive annotation of multiword expressions for Turkish. In: 1st International Conference on Computational Turkish Linguistics; 3 April 2016; Konya, Turkey. pp. 60-66.
  • Savary A, Sailer M, Parmentier Y, Rosner M, Ros´en V, Przepi´orkowski A, Krstev C, Vincze V, W´ojtowicz B, Losnegaard GS et al. PARSEME–PARSing and multiword expressions within a European multilingual network. In: 7th Annual Language and Technology Conference; 2015; Pozna´n, Poland.
  • Eryiğit G, ˙Ilbay T, Can OA. Multiword expressions in statistical dependency parsing. In: 2nd Workshop on Statistical Parsing of Morphologically-Rich Languages; 6 October 2011; Dublin, Ireland. New York, NY, USA:ACL. pp. 44-55
  • Eryiğit G, Adalı K, Torunoğlu-Selamet D, Sulubacak U, Pamay T. Annotation and extraction of multiword expressions in Turkish treebanks. In: 11th Workshop on Multiword Expressions; 4 June 2015; Denver, CO, USA.
  • Çelikkaya G, Torunoğlu D, Eryiğit G. Named entity recognition on real data: a preliminary investigation for Turkish. In: 7th Annual Conference on Application of Information and Communication Technologies; 23–25 October 2013; Baku, Azerbaijan. New York, NY, USA: IEEE. pp. 1-5.
  • Sang EFTK, De Meulder F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: 7th Conference on Natural Language Learning; 2003; Edmonton, Canada. New York, NY, USA: ACL. pp. 142-147.
  • Baldwin T, Kim SN. Multiword expressions. In: Indurkhya N, Damerau FJ, editors. Handbook of Natural Language Processing. 2nd ed. Boca Raton, FL, USA: Chapman and Hall/CRC; 2010. pp. 267-292.
  • Sulubacak U. Improving statistical dependency parsing performance in Turkish by use of a new annotation scheme. MSc, İstanbul Technical University, İstanbul, Turkey, 2015 (in Turkish with a summary in English).
  • Nivre J, de Marneffe MC, Ginter F, Goldberg Y, Hajic J, Manning CD, McDonald R, Petrov S, Pyysalo S, Silveira N et al. Universal Dependencies v1: A multilingual treebank collection. In: 10th International Conference on Language Resources and Evaluation; May 2016; Paris, France.
  • Sundheim B. Overview of results of the MUC-6 evaluation. In: 6th Message Understanding Conference; 6–8 November 1995; Columbia, MD, USA.
  • Rosen V, Losnegaard GS, De Smedt K, Bejcek E, Savary A, Przepi´orkowski A, Osenova P, Mititelu VB. A survey of multiword expressions in treebanks. In: 14th International Workshop on Treebanks & Linguistics; December 2015; Warsaw, Poland.
  • Schuster S, Manning CD. Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: 10th International Conference on Language Resources and Evaluation; 23–28 May 2016; Portoroˇz, Slovenia.
  • Sulubacak U, Gökırmak M, Tyers F, Çöltekin C¸, Nivre J, Eryiğit G. Universal dependencies for Turkish. In: 26th International Conference on Computational Linguistics; 11–17 December 2016; Osaka, Japan. pp. 3444-3454.
  • Seddah D, Sagot B, Candito M, Mouilleron V, Combet V. The French social media bank: a treebank of noisy user generated content. In: 24th International Conference on Computational Linguistics; December 2012; Mumbai, India
  • Bies A, Mott J, Warner C, Kulick S. English Web Treebank. Philadelphia, PA, USA: LDC, 2012.
  • Pamay T, Sulubacak U, Toruno˘glu-Selamet D, Eryi˘git G. The annotation process of the ITU Web Treebank. In: Proceedings of the 9th Linguistic Annotation Workshop; 5 June 2015; Denver, CO, USA.
  • S¸eker GA, Eryiğit G. Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semant Web 2017; 8: 625-642.
  • Eryiğit G. ITU Turkish NLP web service. In: 14th Conference of the European Chapter of the Association for Computational Linguistics; 2014; Gothenburg, Sweden. pp. 1-8
  • Sulubacak U, Pamay T, Eryi˘git G. IMST: A revisited Turkish dependency treebank. In: 1st International Conference on Computational Turkish Linguistics; 3 April 2016; Konya, Turkey.
  • Percival WK. Reflections on the history of dependency notions in linguistics. Hist Ling 1990; 17: 29-47.
  • Eryğit G, Nivre J, Oflazer K. Dependency parsing of Turkish. Comput Linguist 2008; 34: 357-389.
  • Eryiğit G. Dependency parsing of Turkish. PhD, İstanbul Technical University, İstanbul, Turkey, 2006.
  • Buchholz S, Marsi E. CoNLL-X Shared Task on multilingual dependency parsing. In: 10th Conference on Computational Natural Language Learning; 8–9 June 2006; New York, NY, USA. New York, NY, USA: ACL. pp.149-164.
  • Oflazer K, Say B, Hakkani-T¨ur DZ, T¨ur G. Building a Turkish treebank. In: Abeille A, editor. Building and Exploiting Syntactically-Annotated Corpora. Dordrecht, the Netherlands: Kluwer Academic Publishers, 2003.
  • Atalay NB, Oflazer K, Say B. The annotation process in the Turkish Treebank. In: 4th International Workshop on Linguistically Interpreted Corpora; 13–14 April 2003; Budapest, Hungary.
  • Tesniere L. Elements de Syntaxe Structurale. Paris, France: Editions Klinksieck, 1959 (in French).
  • Kübler S, McDonald R, Nivre J. Dependency parsing. In: Heinz J, editor. Synthesis Lectures on Human Language Technologies. San Rafael, CA, USA: Morgan & Claypool, 2009.
APA SULUBACAK U, ERYİĞİT G (2018). Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. , 1662 - 1672. 10.3906/elk-1706-81
Chicago SULUBACAK Umut,ERYİĞİT Gülsen Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. (2018): 1662 - 1672. 10.3906/elk-1706-81
MLA SULUBACAK Umut,ERYİĞİT Gülsen Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. , 2018, ss.1662 - 1672. 10.3906/elk-1706-81
AMA SULUBACAK U,ERYİĞİT G Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. . 2018; 1662 - 1672. 10.3906/elk-1706-81
Vancouver SULUBACAK U,ERYİĞİT G Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. . 2018; 1662 - 1672. 10.3906/elk-1706-81
IEEE SULUBACAK U,ERYİĞİT G "Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing." , ss.1662 - 1672, 2018. 10.3906/elk-1706-81
ISNAD SULUBACAK, Umut - ERYİĞİT, Gülsen. "Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing". (2018), 1662-1672. https://doi.org/10.3906/elk-1706-81
APA SULUBACAK U, ERYİĞİT G (2018). Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering and Computer Sciences, 26(3), 1662 - 1672. 10.3906/elk-1706-81
Chicago SULUBACAK Umut,ERYİĞİT Gülsen Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering and Computer Sciences 26, no.3 (2018): 1662 - 1672. 10.3906/elk-1706-81
MLA SULUBACAK Umut,ERYİĞİT Gülsen Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering and Computer Sciences, vol.26, no.3, 2018, ss.1662 - 1672. 10.3906/elk-1706-81
AMA SULUBACAK U,ERYİĞİT G Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering and Computer Sciences. 2018; 26(3): 1662 - 1672. 10.3906/elk-1706-81
Vancouver SULUBACAK U,ERYİĞİT G Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering and Computer Sciences. 2018; 26(3): 1662 - 1672. 10.3906/elk-1706-81
IEEE SULUBACAK U,ERYİĞİT G "Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing." Turkish Journal of Electrical Engineering and Computer Sciences, 26, ss.1662 - 1672, 2018. 10.3906/elk-1706-81
ISNAD SULUBACAK, Umut - ERYİĞİT, Gülsen. "Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing". Turkish Journal of Electrical Engineering and Computer Sciences 26/3 (2018), 1662-1672. https://doi.org/10.3906/elk-1706-81