Model selection in multivariate adaptive regression splines (MARS) using information complexity as the fitness function

Verfasser / Beitragende:
[Elcin Kartal Koc, Hamparsum Bozdogan]
Ort, Verlag, Jahr:
2015
Enthalten in:
Machine Learning, 101/1-3(2015-10-01), 35-58
Format:
Artikel (online)
ID: 605477922
LEADER caa a22 4500
001 605477922
003 CHVBK
005 20210128100403.0
007 cr unu---uuuuu
008 210128e20151001xx s 000 0 eng
024 7 0 |a 10.1007/s10994-014-5440-5  |2 doi 
035 |a (NATIONALLICENCE)springer-10.1007/s10994-014-5440-5 
245 0 0 |a Model selection in multivariate adaptive regression splines (MARS) using information complexity as the fitness function  |h [Elektronische Daten]  |c [Elcin Kartal Koc, Hamparsum Bozdogan] 
520 3 |a This paper introduces information-theoretic measure of complexity (ICOMP) criterion for model selection in multivariate adaptive regression splines (MARS) to tradeoff efficiently between how well the model fits the data and the model complexity. As is well known, MARS is a popular nonparametric regression technique used to study the nonlinear relationship between a response variable and the set of predictors with the help of piecewise linear or cubic splines as basis functions. A critical aspect in determining the form of the nonparametric regression model during the MARS strategy is the evaluation of portfolio of submodels to select the best submodel with the appropriate number of knots over subset of predictors. In the usual regression modeling, when a large number of predictor variables are present in the model, and there is no precise information about the exact functional relationships among the variables, many model selection criteria still overfit the model. In this paper, to find the simplest model that balances the overfitting and underfitting for the model, ICOMP is proposed as a powerful model selection criterion for MARS modeling. Here, the model complexity is treated with respect to the interdependency of parameter estimates, as well as the number of free parameters in the model. We develop and study the performance of ICOMP along with several most popular model selection criteria such as Akaike's information criterion, Schwarz's Bayesian information criterion and generalized cross-validation in MARS modeling to select the best subset models. We provide two Monte Carlo simulation examples and a real benchmark example to demonstrate the utility and versatility of the proposed model selection approach to determine best functional form of the predictive model. Our numerical examples show that ICOMP provides a general model selection criterion with an insight to the interdependencies and/or correlational structure between parameter estimates in the selected model. This new approach can also be applicable to many complex statistical modeling problems. 
540 |a The Author(s), 2014 
690 7 |a Model selection  |2 nationallicence 
690 7 |a Multivariate adaptive regression Splines (MARS)  |2 nationallicence 
690 7 |a Nonparametric regression  |2 nationallicence 
690 7 |a Information complexity  |2 nationallicence 
700 1 |a Kartal Koc  |D Elcin  |u Department of Statistics, Operations, and Management Science, The University of Tennessee, 37996, Knoxville, TN, USA  |4 aut 
700 1 |a Bozdogan  |D Hamparsum  |u Department of Statistics, Operations, and Management Science, The University of Tennessee, 37996, Knoxville, TN, USA  |4 aut 
773 0 |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 101/1-3(2015-10-01), 35-58  |x 0885-6125  |q 101:1-3<35  |1 2015  |2 101  |o 10994 
856 4 0 |u https://doi.org/10.1007/s10994-014-5440-5  |q text/html  |z Onlinezugriff via DOI 
898 |a BK010053  |b XK010053  |c XK010000 
900 7 |a Metadata rights reserved  |b Springer special CC-BY-NC licence  |2 nationallicence 
908 |D 1  |a research-article  |2 jats 
949 |B NATIONALLICENCE  |F NATIONALLICENCE  |b NL-springer 
950 |B NATIONALLICENCE  |P 856  |E 40  |u https://doi.org/10.1007/s10994-014-5440-5  |q text/html  |z Onlinezugriff via DOI 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Kartal Koc  |D Elcin  |u Department of Statistics, Operations, and Management Science, The University of Tennessee, 37996, Knoxville, TN, USA  |4 aut 
950 |B NATIONALLICENCE  |P 700  |E 1-  |a Bozdogan  |D Hamparsum  |u Department of Statistics, Operations, and Management Science, The University of Tennessee, 37996, Knoxville, TN, USA  |4 aut 
950 |B NATIONALLICENCE  |P 773  |E 0-  |t Machine Learning  |d Springer US; http://www.springer-ny.com  |g 101/1-3(2015-10-01), 35-58  |x 0885-6125  |q 101:1-3<35  |1 2015  |2 101  |o 10994