Yıl: 2022 Cilt: Sayı: 40 Sayfa Aralığı: 27 - 45 Metin Dili: İngilizce DOI: 10.53570/jnt.1147323 İndeks Tarihi: 30-10-2022

Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach

Öz:
Multivariate Adaptive Regression Splines (MARS) is a supervised learning model in machine learning, not obtained by an ensemble learning method. Ensemble learning methods are gathered from samples comprising hundreds or thousands of learners that serve the common purpose of improving the stability and accuracy of machine learning algorithms. This study presented REMARS (Random Ensemble MARS), a new MARS model selection approach obtained using the Random Forest (RF) algorithm. 200 training and test data set generated via the Bagging method were analysed in the MARS analysis engine. At the end of the analysis, two different MARS model sets were created, one yielding the smallest Mean Square Error for the test data (Test MSE) and the other yielding the smallest Generalised Cross-Validation (GCV) value. The best model was estimated for both Test MSE and GCV criteria by examining the error of measurement criteria, variable importance averages, and frequencies of the knot values for each model. Eventually, a new model was obtained via the ensemble learning method, i.e., REMARS, that yields result as good as the MARS model obtained from the original data set. The MARS model, which works better in the larger data set, provides more reliable results with smaller data sets utilising the proposed method.
Anahtar Kelime: Multivariate Adaptive Regression Splines Random Forest Model Selection Machine Learning Ensemble Learning

Belge Türü: Makale Makale Türü: Araştırma Makalesi Erişim Türü: Erişime Açık
  • [1] S. Theodoridis, Machine Learning a Bayesian and Optimisation Perspective, Academic Press of Elsevier, 125 London Wall, London, 2015.
  • [2] S. Suthaharan, Machine Learning Models and Algorithms for Big Data Classification, Springer International Publishing, New York, 2016.
  • [3] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer Series in Statistics, Stanford, California, 2001.
  • [4] T. K. Ho, Random Decision Forests, Proceedings of 3rd International Conference on Document Analysis and Recognition (IEEE), Montreal, Canada, 1995, pp. 278–282.
  • [5] T. K. Ho, The Random Subspace Method for Constructing Decision Forests, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (2) (1998) 832–844.
  • [6] T. Hill, P. Lewicki, Statistics: Methods and Applications, StatSoft, Tulsa OK, 2006.
  • [7] J. R. Leathwick, J. Elith, T. Hastie, Comparative Performance of Generalised Additive Models and Multivariate Adaptive Regression Splines for Statistical Modelling of Species Distributions, Ecological Modelling 199 (2) (2006) 188–196.
  • [8] D. Yao, J. Yang, X. Zhan, A Novel Method for Disease Prediction: Hybrid of Random Forest and Multivariate Adaptive Regression Splines, Journal of Computers 8 (1) (2013) 170–177.
  • [9] L. Kumar, S. K. Rath, Quality Assessment of Web Services Using Multivariate Adaptive Regression Splines, in: J. Sun, Y. R. Reddy, A. Bahulkar, A. Pasala (Eds.), 22nd Asia-Pacific Software Engineering Conference, New Delhi, India, 2015, pp. 238–245.
  • [10] W. Zhang, A. T. Goh, Multivariate Adaptive Regression Splines and Neural Network Models for Prediction of Pile Drivability, Geoscience Frontiers 7 (1) (2016) 45–52.
  • [11] P. Dey, A. K. Das, Application of Multivariate Adaptive Regression Spline-Assisted Objective Function on Optimisation of Heat Transfer Rate Around a Cylinder, Nuclear Engineering and Technology 48 (6) (2016) 1315–1320.
  • [12] Y. J. Chen, J. A. Lın, Y. M. Chen, J. H. Wu, Financial Forecasting with Multivariate Adaptive Regression Splines and Queen Genetic Algorithm-Support Vector Regression. IEEE Access 7 (2019) 112931– 112938.
  • [13] J. Pittman, Adaptive Splines and Genetic Algorithms, Journal of Computational and Graphical Statistics 11 (3) (2002) 615–638.
  • [14] G. W. Weber, I. Batmaz, G. Köksal, P. Taylan, F. Y. Özkurt, CMARS: A New Contribution to Nonparametric Regression with Multivariate Adaptive Regression Splines Supported by Continuous Optimisation, Inverse Problems in Science and Engineering 20 (3) (2012) 371–400.
  • [15] A. Özmen, G. W. Weber, I. Batmaz, E. Kropat, RCMARS: Robustification of CMARS with Different Scenarios Under Polyhedral Uncertainty Set, Communications in Nonlinear Science and Numerical Simulation 16 (12) (2011) 4780–4787.
  • [16] E. K. Koc, C. Iyigun, Restructuring Forward Step of MARS Algorithm Using a New Knot Selection Procedure Based on a Mapping Approach, Journal of Global Optimization 60 (2014) 79–102.
  • [17] E. K. Koc, H. Bozdogan, Model Selection in Multivariate Adaptive Regression Splines (MARS) Using Information Complexity as the Fitness Function, Machine Learning 101 (2015) 35–58.
  • [18] C. Yazıcı, F. Y. Özkurt, I. Batmaz, A Computational Approach to Nonparametric Regression: Bootstrapping CMARS Method, Machine Learnig 101 (2015) 211–230.
  • [19] S. Agarwal, C. R. Chowdary, C. R., A-Stacking and A-Bagging: Adaptive Versions of Ensemble Learning Algorithms for Spoof Fingerprint Detection, Expert Systems with Applications Article ID 113160 (2020) 10 pages.
  • [20] M. E. Lopes, Estimating the Algorithmic Variance of Randomised Ensembles via the Bootstrap, The Annals of Statistics 47 (2) (2019) 1088–1112.
  • [21] S. E. Roshan, S. Asadi, Improvement of Bagging Performance for Classification of Imbalanced Datasets Using Evolutionary Multi-Objective Optimization, Engineering Applications of Artificial Intelligence Article ID 103319 (2020) 19 pages.
  • [22] H. Kim, Y. Lim, Bootstrap Aggregated Classification for Sparse Functional Data, Journal of Applied Statistics 49 (8) (2022) 2052–2063.
  • [23] W. Pintowati, B. W. Otok, Pemodelan Kemiskinan di Propinsi Jawa Timur dengan Pendekatan Multivariate Adaptive Regression Splines Ensemble, Jurnal Sains dan Seni ITS 1 (1) (2012) 283–288.
  • [24] K. D. Roy, B. Datta, Multivariate Adaptive Regression Spline Ensembles for Management of Multilayered Coastal Aquifers, Journal of Hydrologic Engineering 22 (9) (2017) 04017031.
  • [25] R. Zheng, M. Li, X. Chen, S. Zhao, F. Wu, Y. Pan, J. Wang, An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines, IEEE/ACM Transactions on Computational Biology and Bioinformatics 18 (1) (2019) 347–354.
  • [26] L. Breiman, J. Friedman, C. J. Stone, R. Olshen, Classification and Regression Trees. Belmont: Taylor & Francis, New York, 1984.
  • [27] E. M. Kleinberg, Stochastic Discrimination, Annals of Mathematics and Artificial Intelligence 1 (1990) 207–239.
  • [28] E. M. Kleinberg, An Overtraining-Resistant Stochastic Modelling Method for Pattern Recognition, The Annals of Statistics 24 (6) (1996) 2319–2349.
  • [29] E. M. Kleinberg, On the Algorithmic Implementation of Stochastic Discrimination, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (5) (2000) 473–490.
  • [30] L. Breiman, Bagging Predictors (Report No. 421). Department of Statistics University of California. Berkeley, California, 1994.
  • [31] Y. Amit, D. Geman, Shape Quantization and Recognition with Randomised Trees, Neural Computation 9 (7) (1997) 1545–1588.
  • [32] L. Breiman, Random Forest, Machine Learning 45 (1) (2001) 5–32.
  • [33] M. Akman, Y. Genç, H. Ankaralı, Random Forests Methods and an Application in Health Science, Turkiye Klinikleri Journal of Biostatistics 3 (1) (2011) 36–48.
  • [34] J. Abellán, C. J. Mantas, J. G. Castellano, A Random Forest Approach Using Imprecise Probabilities, Knowledge-Based Systems 134 (2017) 72–84.
  • [35] A. Liaw, M. Wiener, R Project. The R Project for Statistical Computing: https://cran.r- project.org/web/packages/randomForest/randomForest.pdf. Accessed on April 9, 2019.
  • [36] Minitab, Minitab: http://www.minitab.com/uploadedFiles/Content/Products/SPM/IntroRF_v_8_2.pdf. Accessed on April 9, 2019.
  • [37] J. H. Friedman, Multivariate Adaptive Regression Splines, The Annals of Statistics 19 (1) (1991) 1–67.
  • [38] J. Deichmann, A. Eshghi, D. Haughton, S. Sayek, N. Teebagy, Application of Multiple Adaptive Regression Splines (MARS) in Direct Response Modeling, Journal of Interactive Marketing 16 (4) (2002) 15–27.
  • [39] G. O. Temel, H. Ankaralı, A. C. Yazıcı, An Alternative Approach to Regression Models: MARS, Turkiye Klinikleri Journal of Biostatistics 2 (2) (2010) 58–66.
  • [40] J. Strickland, Predictive Analytics Using R. Lulu Press (Lulu.com), Morrisville, North Carolina, USA, 2015.
  • [41] L. C. Briand, B. Freimut, F. Vollei, Using Multiple Adaptive Regression Splines to Understand Trends in Inspection Data and Identify Optimal Inspection Rates (Report No. 062.00/E). Fraunhofer IESE, Kaiserslautern, 2001.
  • [42] P. Craven, G. Wahba, Smoothing Noisy Data with Spline Functions: Estimating the Correct Degree of Smoothing by the Method of Generalised Cross-Validation, Numerische Mathematik 31 (4) (1978) 377– 403.
  • [43] J. H. Friedman, Fitting Functions to Noisy Data in High Dimensions (Technical Report No. LCS 101). Stanford University, Department of Statistics, Stanford, CA, 1988.
  • [44] J. H. Friedman, B. W. Silverman, Flexible Parsimonious Smoothing and Additive Modelling, Technometrics 31 (1) (1989) 3–21.
  • [45] I. B. Tager, S. T. Weiss, B. Rosner, F. E. Speizer, Effect of Parental Cigarette Smoking on the Pulmonary Function of Children, American Journal of Epidemiology 110 (1) (1979) 15–26.
  • [46] I. B. Tager, S. T. Weiss, A. Munoz, B. Rosner, F. E. Speizer, Longitudinal Study of the Effects of Maternal Smoking on Pulmonary Function in Children, New England Journal of Medicine 309 (12) (1983) 699– 703.
  • [47] B. Rosner, Fundamentals of Biostatistics. Duxbury Press, Pacific Grove, CA, 1999.
  • [48] M. Kahn, An Exhalent Problem for Teaching Statistics, Journal of Statistics Education 13 (2) (2005) 1– 11.
  • [49] Journal of Statistics Education, JSE Data Archive. http://jse.amstat.org/datasets /fev.dat.txt. Accessed on October 10, 2017.
APA Sabancı D, Cengiz M (2022). Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. , 27 - 45. 10.53570/jnt.1147323
Chicago Sabancı Dilek,Cengiz Mehmet Ali Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. (2022): 27 - 45. 10.53570/jnt.1147323
MLA Sabancı Dilek,Cengiz Mehmet Ali Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. , 2022, ss.27 - 45. 10.53570/jnt.1147323
AMA Sabancı D,Cengiz M Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. . 2022; 27 - 45. 10.53570/jnt.1147323
Vancouver Sabancı D,Cengiz M Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. . 2022; 27 - 45. 10.53570/jnt.1147323
IEEE Sabancı D,Cengiz M "Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach." , ss.27 - 45, 2022. 10.53570/jnt.1147323
ISNAD Sabancı, Dilek - Cengiz, Mehmet Ali. "Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach". (2022), 27-45. https://doi.org/10.53570/jnt.1147323
APA Sabancı D, Cengiz M (2022). Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. Journal of New Theory, (40), 27 - 45. 10.53570/jnt.1147323
Chicago Sabancı Dilek,Cengiz Mehmet Ali Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. Journal of New Theory , no.40 (2022): 27 - 45. 10.53570/jnt.1147323
MLA Sabancı Dilek,Cengiz Mehmet Ali Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. Journal of New Theory, vol., no.40, 2022, ss.27 - 45. 10.53570/jnt.1147323
AMA Sabancı D,Cengiz M Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. Journal of New Theory. 2022; (40): 27 - 45. 10.53570/jnt.1147323
Vancouver Sabancı D,Cengiz M Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach. Journal of New Theory. 2022; (40): 27 - 45. 10.53570/jnt.1147323
IEEE Sabancı D,Cengiz M "Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach." Journal of New Theory, , ss.27 - 45, 2022. 10.53570/jnt.1147323
ISNAD Sabancı, Dilek - Cengiz, Mehmet Ali. "Random Ensemble MARS: Model Selection in Multivariate Adaptive Regression Splines Using Random Forest Approach". Journal of New Theory 40 (2022), 27-45. https://doi.org/10.53570/jnt.1147323