Yıl: 2019 Cilt: 7 Sayı: 2 Sayfa Aralığı: 503 - 522 Metin Dili: Türkçe DOI: 10.29109/gujsc.514483 İndeks Tarihi: 10-04-2020

Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi

Öz:
Konvolüsyonel Sinir Ağları (KSA), katmanlarının en az bir tanesinde matris çarpımı yerine konvolüsyon işleminin kullanıldığı çok katmanlı yapay sinir ağlarının bir türüdür. Özellikle bilgisayarlı görü çalışmalarında çok başarılı sonuçlar elde edilse de KSA hala birçok zorluk içermektedir. Daha başarılı sonuçlar elde etmek için geliştirilen mimarilerin giderek daha derinleşmesi ve kullanılan görüntülerin giderek daha yüksek kalitede olmasıyla daha fazla hesaplama maliyetleri ortaya çıkmaktadır. Hem bu hesaplama maliyetlerinin düşürülmesi, hem de başarılı sonuçlar elde edilebilmesi, güçlü donanımların kullanılmasına ve kurulan ağın hiper-parametrelerin optimize edilmesine bağlıdır. Bu çalışmada, Genetik Algoritma, Parçacık Sürü Optimizasyonu, Diferansiyel Evrim ve Bayes Optimizasyonu gibi yöntemler ile KSA optimizasyonu gerçekleştirilen çalışmalar incelendi. Bu çalışmalarda optimize edilen hiper-parametreler, tanımlanan değer aralıkları ve elde edilen sonuçlar incelendi. Buna göre, KSA’ nın performansında en etkili hiper-parametrelerin filtre sayısı, filtre boyutu, katman sayısı, seyreltme oranı, öğrenme oranı ve yığın boyutu olduğu görülmüştür. Aynı veri kümelerinin kullanıldığı çalışmalar, elde edilen doğruluk değerleri açısından karşılaştırıldığında çoğu veri kümesi için en iyi doğruluk oranlarının popülasyon tabanlı yöntemlerden Genetik Algoritma ve Parçacık Sürü Optimizasyonu kullanılan çalışmalarda elde edildiği görülmüştür. Bu üst-sezgiseller ile elde edilen modellerin performanslarının “state of the art” modellerle yarışabilir durumda hatta bazen daha iyi oldukları görülmüştür. Yine üst-sezgisel kullanılan bazı çalışmalarda üretilen modellerin aşırı büyümesi engellenmiş; basit ve kolay eğitilebilir modeller üretilmiştir. Hesaplama maliyeti açısından çok avantajlı bu basit modeller ile literatürdeki karmaşık modellere çok yakın sonuçlar elde edilebilmiştir.
Anahtar Kelime:

Konular: Bilgisayar Bilimleri, Yazılım Mühendisliği Bilgisayar Bilimleri, Bilgi Sistemleri Bilgisayar Bilimleri, Yapay Zeka

A Survey of Hyper-parameter Optimization Methods in Convolutional Neural Networks

Öz:
Convolutional neural networks (CNN) are special types of multi-layer artificial neural networks in which convolution method is used instead of matrix multiplication in at least one of its layers. Although satisfactory results have been achieved by CNN especially in computer vision studies, they still have some difficulties. As the proposed network architectures become deeper with the aim of much better accuracy and the resolution of the input images increases, this results in a need for more computational power. Reducing the computational cost while at the same time still having high accuracy rates depend on the use of powerful equipments and the selection of hyper-parameter values in CNN. In this study, we examined methods like Genetic Algorithms, Particle Swarm Optimization, Differential Evolution and Bayes Optimization that has been used extensively to optimize CNN hyperparameters, and also listed the hyper-parameters selected to be optimized in those studies, ranges of those parameter values and the results obtained by each of those studies. These studies reveal that the number of layers, number and size of the kernels at each layer, learning rate and the batch size parameters are among the hyper-parameters that affect the performance of the CNNs the most. When the studies that use the same datasets are compared in terms of accuracy, Genetic Algorithms and Particle Swarm Optimization which are both population-based methods achieve the best results for the majority of the datasets. It is also shown that the performance of the models found in these studies are competitive or sometimes better than those of the “state of the art” models. In addition, the CNNs produced in these studies are prevented from being overgrown by imposing limits on the hiperparameter values. Thus simpler and easier to train models have been obtained. These computationally advantageous simpler models were able to achieve competitive results compared to complicated models.
Anahtar Kelime:

Konular: Bilgisayar Bilimleri, Yazılım Mühendisliği Bilgisayar Bilimleri, Bilgi Sistemleri Bilgisayar Bilimleri, Yapay Zeka
Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] I. Goodfellow, Y. Bengio, A. Courville and Y. Bengio (2016), Deep learning (Vol. 1). Cambridge: MIT press
  • [2] E. Öztemel (2003), Yapay Sinir Ağları. Istanbul: Papatya Yayincilik.
  • [3] S. McCulloch Warren, W. Pitts, A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5:4 (1943) 115-133.
  • [4] B. W. A. C. Farley, W. Clark, Simulation of self-organizing systems by digital computer. Transactions of the IRE Professional Group on Information Theory, 4:4 (1954) 76-84.
  • [5] E. Ülker, Derin Öğrenme ve Görüntü Analizinde Kullanılan Derin Öğrenme Modelleri. Gaziosmanpaşa Bilimsel Araştırma Dergisi, 6.3 (2017) 85-104.
  • [6] X. Glorot, B. Yoshua, Understanding the difficulty of training deep feedforward neural networks. Proceedings of the thirteenth international conference on artificial intelligence and statistics, (2010).
  • [7] B. Karlik, A. V. Olgac, Performance analysis of various activation functions in generalized MLP architectures of neural networks. International Journal of Artificial Intelligence and Expert Systems, 1:4 (2011) 111-122.
  • [8] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision, (2015).
  • [9] M. Sinecen, B. Kaya, Ö. Yıldız, Artificial Neural Network Based Early Warning System For Aydin Province Towards Air Factors Which Primarily Affect Human Health. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, 5:4 (2017) 121-131. [10] T. Yichuan, Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, (2013). [11] E. Alpaydin (2009). Introduction to machine learning. MIT press.
  • [12] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15:1 (2014).
  • [13] L. Bottou, Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010, Physica-Verlag HD, (2010) 177-186. [14] D. P. Kingma, J. L. Ba, Adam: Amethod for stochastic optimization. Proc. 3rd Int. Conf. Learn. Representations, (2014). [15] M. D. Zeiler, ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, (2012).
  • [16] S. Ruder, An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, (2016).
  • [17] N. Qian, On the momentum term in gradient descent learning algorithms. Neural networks, 12 (1999), 145-151.
  • [18] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E. Hubbard, L. D. Jackel, Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems, (1990). [19] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86:11 (1998) 2278-2324 [20] L. Fei-Fei, R. Fergus, P. Perona, Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer vision and Image understanding, 106:1 (2007) 59-70.
  • [21] “Caltech-101”, son güncelleme 5 Nisan, 2006, http://www.vision.caltech.edu/.
  • [22] Cancer Genome Atlas - miRNASeq son güncelleme 20 Kasım, 2018, http://cancergenome.nih.gov/.
  • [23] A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images. Technical report, University of Toronto, (2009).
  • [24] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, Y. Bengio, An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th international conference on Machine learning, (2007) 473-480.
  • [25] G. Cohen, S. Afshar, J. Tapson, A. van Schaik, EMNIST: an extension of MNIST to handwritten letters. arXiv preprint arXiv:1702.05373, (2017).
  • [26] H. Xiao, R. Kashif, V. Roland, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, (2017).
  • [27] Eitz M., Hays J., Alexa M., How do humans sketch objects?. ACM Trans. Graph, 31:4 (2012) 44-1.
  • [28] Armato III, G. Samuel, et al, The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics, 38:2 (2011) 915-931.
  • [29] Y. LeCun, C. Cortes, C. J. Burges, MNIST handwritten digit database. AT&T Labs [Online], Available: http://yann. lecun. com/exdb/mnist, (2010).
  • [30] A. Coates, A. Ng, H. Lee, An analysis of single-layer networks in unsupervised feature learning. Proceedings of the fourteenth international conference on artificial intelligence and statistics, (2011).
  • [31] K. K. Reddy, M. Shah, Recognizing 50 human action categories of web videos. Machine Vision and Applications, 24:5 (2013) 971-981.
  • [32] J. H. Holland (1992), Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT press.
  • [33] J. H. Holland, Genetic algorithms. Scientific american, 267 (1992) 66-73.
  • [34] D. E. Goldberg, J. H. Holland, Genetic algorithms and machine learning. Machine learning, 3:2 (1988) 95-99. [35] E. P. Ijjina, M. C. Krishna, Human action recognition using genetic algorithms and convolutional neural networks. Pattern recognition, 59 (2016) 199-212. [36] G. L. da Silva, O. P. da Silva Neto, A. C. Silva, A. C. de Paiva, M. Gattass, Lung nodules diagnosis based on evolutionary convolutional neural network. Multimedia Tools and Applications, 76:18 (2017) 1903919055.
  • [37] A. Lopez-Rincon, A. Tonda, M. Elati, O. Schwander, B. Piwowarski, P. Gallinari, Evolutionary optimization of convolutional neural networks for cancer miRNA biomarkers classification. Applied Soft Computing, 65 (2018) 91-100.
  • [38] E. Dufourq, A. B. Bruce, EDEN: Evolutionary deep networks for efficient machine learning. Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech), IEEE, (2017). [39] Y. Sun, X. Bing, M. Zhang, Evolving deep convolutional neural networks for image classification. arXiv preprint arXiv: 1710.10741, (2017).
  • [40] E. Bochinski, S. Tobias, T. Sikora, Hyper-parameter optimization for convolutional neural network committees based on evolutionary algorithms. Image Processing (ICIP), 2017 IEEE International Conference on. IEEE, (2017).
  • [41] S. Fujino, M. Naoki, K. Matsumoto, Deep convolutional networks for human sketches by means of the evolutionary deep learning. Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), 2017 Joint 17th World Congress of International. IEEE, (2017).
  • [42] A. Krizhevsky, S. Ilya, G. E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, (2012).
  • [43] B. Ma, Y. Xia, Autonomous Deep Learning: A Genetic DCNN Designer for Image Classification. arXiv preprint arXiv:1807.00284, (2018). [44] F. Assunçao, N. Lourenço, P. Machado, B. Ribeiro, DENSER: Deep Evolutionary Network Structured Representation. Genetic Programming and Evolvable Machines, (2018) 1-31.
  • [45] A. Baldominos, S. Yago, P. Isasi, Evolutionary convolutional neural networks: An application to handwriting recognition. Neurocomputing, 283 (2018) 38-52.
  • [46] R. Eberhart, J. Kennedy, A new optimizer using particle swarm theory. In Micro Machine and Human Science, Proceedings of the Sixth International Symposium, (1995, October) 39-43.
  • [47] J. Kennedy (2011), Particle swarm optimization. In Encyclopedia of machine learning, Boston: Springer, 760-766.
  • [48] T. Yamasaki, H. Takuto, A. Kiyoharu, Efficient Optimization of Convolutional Neural Networks Using Particle Swarm Optimization. Multimedia Big Data (BigMM), 2017 IEEE Third International Conference on IEEE, (2017).
  • [49] G. L. F. da Silva, T. L. A. Valente, A. C. Silva, A. C. de Paiva, M. Gattass, Convolutional neural networkbased PSO for lung nodule false positive reduction on CT images. Computer methods and programs in biomedicine, 162 (2018) 109-118.
  • [50] P. R. Lorenzo, J. Nalepa, M. Kawulok, L. S. Ramos, J. R. Pastor, Particle swarm optimization for hyperparameter selection in deep neural networks. Proceedings of the Genetic and Evolutionary Computation Conference, ACM, (2017).
  • [51] J. Nalepa, P. R. Lorenzo, Convergence Analysis of PSO for Hyper-Parameter Selection in Deep Neural Networks. International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, (2018). [52] Y. Sun, B. Xue, M. Zhang, A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification. arXiv preprint arXiv:1712.05042, (2017).
  • [53] B. Wang, Y. Sun, B. Xue, M. Zhang, Evolving Deep Convolutional Neural Networks by Variable-length Particle Swarm Optimization for Image Classification. arXiv preprint arXiv:1803.06492, (2018).
  • [54] R. Storn, K. Price, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11 (1997) 341-359.
  • [55] T. Keskintürk, Diferansiyel gelişim algoritması. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, 5 (2006) 85-99.
  • [56] B. Wang, Y. Sun, B. Xue, M. Zhang, A Hybrid Differential Evolution Approach to Designing Deep Convolutional Neural Networks for Image Classification, (2018).
  • [57] Z. W. Geem, J. H. Kim, G. V. Loganathan, A new heuristic optimization algorithm: harmony search. simulation, 76 (2001) 60-68. [58] W. Y. Lee, S. M. Park, K. B. Sim, Optimal hyperparameter tuning of convolutional neural networks based on the parameter-setting-free harmony search algorithm. Optik, 172 (2018) 359-367. [59] Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. Optimization by simulated annealing. science, 220 (1983) , 671-680. [60] L. M. Rere, M. I. Fanany, A. M.Arymurthy, Metaheuristic algorithms for convolution neural network. Computational intelligence and neuroscience, (2016).
  • [61] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13.Feb (2012), (281-305).
  • [62] T. Domhan, J. T. Springenberg, F. Hutter, Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves. IJCAI, 15 (2015).
  • [63] D. Saranyaraj, M. Manikandan, S. Maheswari, A deep convolutional neural network for the early detection of breast carcinoma with respect to hyper-parameter tuning. Multimedia Tools and Applications, (2018) 1-26.
  • [64] P. Neary, Automatic Hyperparameter Tuning in Deep Convolutional Neural Networks Using Asynchronous Reinforcement Learning. 2018 IEEE International Conference on Cognitive Computing (ICCC), IEEE, (2018).
  • [65] B. van Stein, H. Wang, T. Bäck, Automatic Configuration of Deep Neural Networks with EGO. arXiv preprint arXiv:1810.05526, (2018).
  • [66] T. Hinz, N. Navarro-Guerrero, S. Magg, S. Wermter, Speeding up the Hyperparameter Optimization of Deep Convolutional Neural Networks. International Journal of Computational Intelligence and Applications, (2018)
  • [67] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). [68] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, A. Rabinovich, Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (2015) 1-9. [69] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016) 770-778. [70] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017) 4700-4708.
APA GÜLCÜ A, KUŞ Z (2019). Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. , 503 - 522. 10.29109/gujsc.514483
Chicago GÜLCÜ AYLA,KUŞ ZEKİ Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. (2019): 503 - 522. 10.29109/gujsc.514483
MLA GÜLCÜ AYLA,KUŞ ZEKİ Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. , 2019, ss.503 - 522. 10.29109/gujsc.514483
AMA GÜLCÜ A,KUŞ Z Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. . 2019; 503 - 522. 10.29109/gujsc.514483
Vancouver GÜLCÜ A,KUŞ Z Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. . 2019; 503 - 522. 10.29109/gujsc.514483
IEEE GÜLCÜ A,KUŞ Z "Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi." , ss.503 - 522, 2019. 10.29109/gujsc.514483
ISNAD GÜLCÜ, AYLA - KUŞ, ZEKİ. "Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi". (2019), 503-522. https://doi.org/10.29109/gujsc.514483
APA GÜLCÜ A, KUŞ Z (2019). Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, 7(2), 503 - 522. 10.29109/gujsc.514483
Chicago GÜLCÜ AYLA,KUŞ ZEKİ Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji 7, no.2 (2019): 503 - 522. 10.29109/gujsc.514483
MLA GÜLCÜ AYLA,KUŞ ZEKİ Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, vol.7, no.2, 2019, ss.503 - 522. 10.29109/gujsc.514483
AMA GÜLCÜ A,KUŞ Z Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji. 2019; 7(2): 503 - 522. 10.29109/gujsc.514483
Vancouver GÜLCÜ A,KUŞ Z Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji. 2019; 7(2): 503 - 522. 10.29109/gujsc.514483
IEEE GÜLCÜ A,KUŞ Z "Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi." Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji, 7, ss.503 - 522, 2019. 10.29109/gujsc.514483
ISNAD GÜLCÜ, AYLA - KUŞ, ZEKİ. "Konvolüsyonel Sinir Ağlarında Hiper-Parametre Optimizasyonu Yöntemlerinin İncelenmesi". Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji 7/2 (2019), 503-522. https://doi.org/10.29109/gujsc.514483