Yıl: 2024 Cilt: 23 Sayı: 45 Sayfa Aralığı: 209 - 244 Metin Dili: Türkçe DOI: 10.55071/ticaretfbd.1354040 İndeks Tarihi: 01-07-2024

DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI

Öz:
Son yıllarda Doğal Dil İşleme (DDİ) alanındaki gelişmelerin hız kazanması, araştırmacıların ve programcıların bu alana olan ilgisini büyük ölçüde artırmıştır. Bilgisayar programlarını doğal dil komutlarıyla yazma konsepti, birçok araştırmacının odak noktası haline gelmiştir. Literatür incelendiğinde, doğal dil ile programlama üzerine yapılan araştırmaların uzun bir geçmişe sahip olduğu açıkça görülmektedir. Bu uzun soluklu araştırmalar, çeşitli çözüm önerilerini beraberinde getirmiş ve kural tabanlı yöntemlerden, olasılık tabanlı yöntemlere, makine öğrenmesi yöntemlerinden derin öğrenme yöntemlerine kadar bir dizi çözüm yaklaşımının ortaya çıkmasına neden olmuştur. Literatürdeki çalışmalar tarihsel olarak kategorize edildiğinde geçmiş tarihli çalışmalarda kural tabanlı ya da istatistik tabanlı modeller üzerine yoğunlaştığı görülürken günümüze yaklaşıldıkça makine öğrenmesi ve derin öğrenme temelli çalışmaların arttığı görülmektedir. Kural tabanlı yöntemler, olasılık tabanlı yöntemler, makine öğrenmesi yöntemleri, derin öğrenme yöntemleri gibi çeşitli yaklaşımların geliştirildiği literatürde, bu çeşitlilik yeni araştırmacıların bu alana giriş yaparken karşılaşabileceği karmaşıklığı artırabilmektedir. Bu çalışma, doğal dil girdileriyle programlama dili kodu oluşturma çalışmalarına yönelik literatürde geliştirilen 32 yöntemin detaylı bir incelenmesini sunmaktadır. Çalışmanın amacı, literatürde tespit edilen çeşitli yöntemlerin zaman içerisindeki değişimlerinin gözden geçirilmesi, çalışmaların geniş bir perspektiften incelenerek genel bir çerçeve içinde toplanması ve bu alanda çalışacak olan araştırmacılara rehberlik edebilmesidir.
Anahtar Kelime: Derin öğrenme doğal dil işleme kod üretimi makine öğrenmesi.

NATURAL LANGUAGE TEXT TO PROGRAMMING LANGUAGE CODE GENERATION STUDIES: A REVIEW

Öz:
The recent surge in advancements in Natural Language Processing (NLP) has significantly heightened the interest of researchers and programmers in this field. The concept of writing computer programs using natural language commands has become a focal point for many researchers. Upon reviewing the literature, it is evident that research on natural language programming has a long history. These extensive studies have led to the proposal of various solutions, ranging from rule-based methods to probability-based approaches, machine learning methods, and deep learning techniques. When the studies in the literature are categorized historically, it is seen that the past studies focused on rule-based or statistical-based models, while machine learning and deep learning- based studies have increased as we approach the present. The diversity of approaches, including Rule-based methods, probability- based methods, machine learning methods, deep learning methods, and others, as found in the literature, can potentially confuse newcomers entering this field. This paper presents a detailed review of 32 methods developed in the literature for generating programming language code with natural language input. The goal of this study is to review the changes in various methods identified in the literature over time, to collect the studies in a general framework by examining them from a broad perspective and provide guidance to researchers intending to work in this area.
Anahtar Kelime: Code generation deep learning machine learning natural language processing.

Belge Türü: Makale Makale Türü: Derleme Erişim Türü: Erişime Açık
0
0
0
  • Agashe, R., Iyer, S., & Zettlemoyer, L. (2019). JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation. arXiv preprint arXiv:1910.02216.
  • Allamanis, M., Tarlow, D., Gordon, A. D., & Wei, Y. (2015). Bimodal Modelling of Source Code and Natural Language. In International conference on machine learning (pp. 2123-2132). PMLR.
  • Almeida, F., & Xexéo, G. (2019). Word Embeddings: A Survey. arXiv preprint arXiv:1901.09069. http://arxiv.org/abs/1901.09069
  • Alzubi, J., Nayyar, A., & Kumar, A. (2018). Machine Learning from Theory to Algorithms: An Overview. Journal of Physics: Conference Series, 1142(1). https://doi.org/10.1088/1742-6596/1142/1/012012
  • Barone, A. V. M., & Sennrich, R. (2017). A parallel corpus of Python functions and documentation strings for automated code documentation and code generation. arXiv preprint arXiv:1707.02275. http://arxiv.org/abs/1707.02275
  • Bhatt, S. (2018). Reinforcement Learning 101. https://towardsdatascience.com/reinforcement-learning-101- e24b50e1d292 adresinden 07 Kasım 2023 tarihinde alınmıştır.
  • Brown, P. E., Della Pietra, V. J., Della Pietra, S. A., & Mercer, R. L. (1993). The Mathematics of Statistical Machine Translation: Parameter Estimation.
  • Card2code. (2017) https://github.com/deepmind/card2code adresine 23 Temmuz 2023 tarihinde erişilmiştir.
  • Chowdhary, K., Chowdhary, K. R. (2020). Natural language processing. Fundamentals of artificial intelligence, 603-649
  • Cozzie, A., Finnicum, M., & King, S. T. (2011). Macho: Programming With Man Pages. In 13th Workshop on Hot Topics in Operating Systems (HotOS XIII).
  • Cozzie, A., & King, S. T. (2012). Macho: Writing Programs with Natural Language and Examples. www.acoz.net/macho
  • Delua, J. (2021). Supervised vs. Unsupervised Learning: What’s the Difference?. https://www.ibm.com/blog/supervised-vsunsupervised-learning/ adresinden 06 Kasım 2023 tarihinde alınmıştır.
  • Desai, A., Gulwani, S., Hingorani, V., Jain, N., Karkare, A., Marron, M., Sailesh, R., & Roy, S. (2016). Program synthesis using natural language. Proceedings - International Conference on Software Engineering, 14-22-May-2016, 345-356. https://doi.org/10.1145/2884781.2884786
  • Dong, L., & Lapata, M. (2016). Language to Logical Form with Neural Attention. http://arxiv.org/abs/1601.01280
  • Ernst, M. D. (2017). Natural language is a programming language: Applying natural language processing to software development. Leibniz International Proceedings in Informatics, LIPIcs, 71. https://doi.org/10.4230/LIPIcs.SNAPL.2017.4
  • Gemmell, C., Rossetto, F., & Dalton, J. (2020). Relevance Transformer: Generating Concise Code Snippets with Relevance Feedback. SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005-2008. https://doi.org/10.1145/3397271.3401215
  • Gu, X., Zhang, H., & Kim, S. (2018). Deep code search. Proceedings - International Conference on Software Engineering, 933-944. https://doi.org/10.1145/3180155.3180167
  • Gulwani, S., & Marron, M. (2014). NLyze: Interactive programming by natural language for spreadsheet data analysis and manipulation. Proceedings of the ACM SIGMOD International Conference on Management of Data, 803-814. https://doi.org/10.1145/2588555.2612177
  • Hong, J., Dohan, D., Singh, R., Sutton, C., & Zaheer, M. (2021). Latent Programmer: Discrete Latent Codes for Program Synthesis.
  • Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., & Brockschmidt, M. (2019). CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. http://arxiv.org/abs/1909.09436
  • Knöll R, & Mezini M. (2006). Pegasus – First Steps Toward a Naturalistic Programming Language. Association for Computing Machinery.
  • Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. In Information (Switzerland) (Vol. 10, Issue 4). MDPI AG. https://doi.org/10.3390/info10040150
  • Krogh, A. (2008). What are artificial neural networks? In Nature Biotechnology (Vol. 26). http://www.r-project.org/
  • Le, V., Gulwani, S., & Su, Z. (2013). SmartSynth: Synthesizing Smartphone Automation Scripts from Natural Language. In Proceeding of the 11th annual international conference on Mobile systems, applications, and services (pp. 193-206).
  • Lin, X. V., Wang, C., Pang, D., Vu, K., Zeelemoyer, L., & Ernst, M. D. (2017). Program Synthesis from Natural Language Using Recurrent Neural Networks. University of Washington Department of Computer Science and Engineering, Seattle, WA, USA, Tech. Rep. UW-CSE-17-03-01.
  • Ling, W., Grefenstette, E., Hermann, K. M., Kočiský, T., Senior, A., Wang, F., & Blunsom, P. (2016). Latent Predictor Networks for Code Generation. arXiv preprint arXiv:1603.06744. http://arxiv.org/abs/1603.06744
  • Little, G., & Miller, R. C. (2006). Translating Keyword Commands into Executable Code. In Proceedings of the 19th annual ACM symposium on User interface software and technology (pp. 135-144).
  • Liu, H. (2004), MontyLingua v.2.1(Python and Java) A Free, Commonsense-Enriched Natural Language Understander for English. http://alumni.media.mit.edu/~hugo/montylingua/ adresine 31 Ağustos 2023 tarihinde erişilmiştir.
  • Liu, H., & Lieberman, H. (2005). Metafor: Visualizing Stories as Code. In Proceedings of the 10th international conference on Intelligent user interfaces (pp. 305-307).
  • Liu, X., & Wu, D. (2018). From natural language to programming language. In Innovative Methods, User-Friendly Tools, Coding, and Design Approaches in People-Oriented Programming (ss. 110-130). IGI Global. https://doi.org/10.4018/978-1-5225-5969-6.ch004
  • Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., Tang, D., Li, G., Zhou, L., Shou, L., Zhou, L., Tufano, M., Gong, M., Zhou, M., Duan, N., Sundaresan, N., … Liu, S. (2021). CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv preprint http://arxiv.org/abs/2102.04664
  • Mandal, S., & Naskar, S. K. (2017). Natural Language Programming with Automatic Code Generation towards Solving Addition-Subtraction Word Problems. Içinde NLP Association of India. NLPAI. http://docs.oracle.com/javase/
  • Manshadi, M., Gildea, D., & Allen, J. (2013). Integrating Programming by Example and Natural Language Programming. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 27, No. 1, pp. 661-667).
  • Manshadi, M., Keenan, C., & Allen, J. (2012, July). Using the crowd to do natural language programming. In Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence.
  • Mansouri, A., Affendey, L. S., & Mamat, A. (2008). Named Entity Recognition Approaches. In IJCSNS International Journal of Computer Science and Network Security (Vol. 8, Issue 2).
  • Mou, L., Men, R., Li, G., Zhang, L., & Jin, Z. (2015). On End-to-End Program Generation from User Intention by Deep Neural Networks. http://arxiv.org/abs/1510.07211
  • Nguyen, A. T., & Nguyen, T. N. (2015). Graph-based statistical language model for code. Proceedings - International Conference on Software Engineering, 1, 858-868. https://doi.org/10.1109/ICSE.2015.336
  • Nguyen, T., Rigby, P. C., Nguyen, A. T., Karanfil, M., & Nguyen, T. N. (2016). T2API: Synthesizing API code usage templates from english texts with statistical translation. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 13-18-November-2016, 1013-1017. https://doi.org/10.1145/2950290.2983931
  • Nizzad, A. R. M., & Thelijjagoda, S. (2022). Designing of a Voice-Based Programming IDE for Source Code Generation: A Machine Learning Approach. Proceedings - International Research Conference on Smart Computing and Systems Engineering, SCSE 2022, 14-21. https://doi.org/10.1109/SCSE56529.2022.9905095
  • Perez, L., Ottens, L., & Viswanathan, S. (2021). Automatic Code Generation using Pre-Trained Language Models. http://arxiv.org/abs/2102.10535
  • Phan, H. (2019). Self Learning from Large Scale Code Corpus to Infer Structure of Method Invocations. https://www.programcreek.com/
  • Pise, N. N., & Kulkarni, P. (2008). A survey of semi-supervised learning methods. Proceedings - 2008 International Conference on Computational Intelligence and Security, CIS 2008, 2, 30–34. https://doi.org/10.1109/cis.2008.204
  • Price, D., Riloff, E., Zachary, J., & Harvey, B. (2000). NaturalJava: A Natural Language Interface for Programming in Java.
  • Quirk, C., Mooney, R., & Galley, M. (2015). Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 878-888).
  • Raghothaman, M., Wei, Y., & Hamadi, Y. (2016). SWIM: Synthesizing what i mean code search and idiomatic snippet synthesis. Proceedings - International Conference on Software Engineering, 14-22-May-2016, 357-367. https://doi.org/10.1145/2884781.2884808
  • Schlegel, V., Handschuh, S., Lang, B., & Freitas, A. (2019). Vajra: Step-by-step Programming with Natural Language. International Conference on Intelligent User Interfaces, Proceedings IUI, Part F147615, 30-39. https://doi.org/10.1145/3301275.3302267
  • Scopus. (2023). “Generating programming language code using natural language” cümlesi kullanılarak yapılan tarama. https://www.scopus.com/ adresinden 01 Eylül 2023 tarihinde alınmıştır.
  • Shi, S., Wang, Y., Lin, C.-Y., Liu, X., & Rui, Y. (2015). Automatically Solving Number Word Problems by Semantic Parsing and Reasoning. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1132-1142).
  • Shin, R., Allamanis, M., Brockschmidt, M., & Polozov, O. (2019). Program Synthesis and Semantic Parsing with Learned Code Idioms. Advances in Neural Information Processing Systems, 32.
  • Siddhartha, B. S., Khyani, D., Niveditha, N. M., & Divya, B. M. (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Journal of University of Shanghai for Science and Technology, 22(10), 350-357.
  • Somasundaram, K., & Swaminathan, H. (2011). Automatic Programming through Natural Language Compiler. In Proceedings on the International Conference on Artificial Intelligence (ICAI) (p. 1). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp).
  • Spider. (2018) Yale Semantic Parsing and Text-to-SQL Challenge. https://yale-lily.github.io/spider adresine 23 Temmuz 2023 tarihinde erişilmiştir.
  • Stecanella, B. (2019) Understanding TF-ID: A Simple Introduction. https://monkeylearn.com/blog/what-is-tf-idf/ adresinden 05 Kasım 2023 tarihinde alınmıştır.
  • Wan, Y., Zhao, Z., Yang, M., Xu, G., Ying, H., Wu, J., & Yu, P. S. (2018). Improving automatic source code summarization via deep reinforcement learning. ASE 2018 - Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 397-407. https://doi.org/10.1145/3238147.3238206 WikiSQL. (2017). https://github.com/salesforce/WikiSQL adresine 23 Temmuz 2023 tarihinde erişilmiştir
  • Yin, P., & Neubig, G. (2017). A Syntactic Neural Model for General-Purpose Code Generation. arXiv preprint arXiv:1704.01696.
  • Yse, D. L., (2021). Text Normalization for Natural Language Processing (NLP). https://towardsdatascience.com/text-normalization-for-natural-language-processing-nlp-70a314bfa6 adresinden 03 Kasım 2023 tarihinde alınmıştır.
  • Yüksel, A. S., & Karabıyık, M. A.(2022). Doğal dil işleme yöntemleriyle metinden SQL sorgusu tahmini üzerine bir çalışma A study on text-to-SQL query prediction with natural language processing methods. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 11(4), 846-855.
  • Zhao, J., Song, Y., Wang, J., & Harris, I. G. (2022). GAP-Gen: Guided Automatic Python Code Generation. arXiv preprint arXiv:2201.08810.
  • Zhong, V., Xiong, C., & Socher, R. (2017). Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. http://arxiv.org/abs/1709.00103
  • Zhu, Y., Zhang, Y., Yang, H., & Wang, F. (2019). GANCoder: An Automatic Natural Language-to-Programming Language Translation Approach based on GAN. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8 (pp. 529-539). Springer International Publishing. http://arxiv.org/abs/1912.00609
APA Hatipoğlu A, bilgin t (2024). DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. , 209 - 244. 10.55071/ticaretfbd.1354040
Chicago Hatipoğlu Ayşegül,bilgin turgay tugay DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. (2024): 209 - 244. 10.55071/ticaretfbd.1354040
MLA Hatipoğlu Ayşegül,bilgin turgay tugay DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. , 2024, ss.209 - 244. 10.55071/ticaretfbd.1354040
AMA Hatipoğlu A,bilgin t DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. . 2024; 209 - 244. 10.55071/ticaretfbd.1354040
Vancouver Hatipoğlu A,bilgin t DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. . 2024; 209 - 244. 10.55071/ticaretfbd.1354040
IEEE Hatipoğlu A,bilgin t "DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI." , ss.209 - 244, 2024. 10.55071/ticaretfbd.1354040
ISNAD Hatipoğlu, Ayşegül - bilgin, turgay tugay. "DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI". (2024), 209-244. https://doi.org/10.55071/ticaretfbd.1354040
APA Hatipoğlu A, bilgin t (2024). DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, 23(45), 209 - 244. 10.55071/ticaretfbd.1354040
Chicago Hatipoğlu Ayşegül,bilgin turgay tugay DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi 23, no.45 (2024): 209 - 244. 10.55071/ticaretfbd.1354040
MLA Hatipoğlu Ayşegül,bilgin turgay tugay DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, vol.23, no.45, 2024, ss.209 - 244. 10.55071/ticaretfbd.1354040
AMA Hatipoğlu A,bilgin t DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi. 2024; 23(45): 209 - 244. 10.55071/ticaretfbd.1354040
Vancouver Hatipoğlu A,bilgin t DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi. 2024; 23(45): 209 - 244. 10.55071/ticaretfbd.1354040
IEEE Hatipoğlu A,bilgin t "DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI." İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, 23, ss.209 - 244, 2024. 10.55071/ticaretfbd.1354040
ISNAD Hatipoğlu, Ayşegül - bilgin, turgay tugay. "DOĞAL DİL METİNLERİNDEN PROGRAMLAMA DİLİ KODU OLUŞTURMA ÇALIŞMALARI: BİR DERLEME ÇALIŞMASI". İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi 23/45 (2024), 209-244. https://doi.org/10.55071/ticaretfbd.1354040