(177a) Obtaining Parsimonious Regression Models with Large Datasets
AIChE Spring Meeting and Global Congress on Process Safety
2018
2018 Spring Meeting and 14th Global Congress on Process Safety
Industry 4.0 Topical Conference
Big Data Analytics - Industry Perspective II
Wednesday, April 25, 2018 - 10:15am to 10:45am
Three main classes of feature selection approaches are available[2]: filter methods, wrapping methods and embedded methods. Filters are mostly based on univariate measures of association between predictors and response variables, and tend to be more efficient than wrappers and embedded methods. Analyzing the literature on filter methods, a comprehensive list of different filters has been proposed and tested for classification tasks[3], while filters for regression problems remain vastly unexplored[4]. This research focuses on mitigating this gap by assessing and comparing the performance of different filters for feature selection in regression problems. Various association metrics are considered, including Pearsonâs correlation coefficient, Spearmanâs correlation, Kendallâs correlation, mutual information, and also combinations of mutual information with other filters. These filters have the flexibility to account for various relationships between predictors and response variables and can capture linear correlations (Pearsonâs correlation), monotonic relationships (Spearmanâs correlation) and non-linear associations (mutual information).
Two key performance indicators (KPI) are utilized to quantify the performance of the different filters. The first KPI measures whether relevant variables are selected and noisy ones are removed. This KPI is only used in a simulation setting because the data-generating mechanisms are known. The second KPI assesses the improvements in prediction performance obtained when different filters are used prior to model building. Furthermore, various regression methods, including Partial Least Squares (PLS) regression, were also tested in order to assess their interactions with the filters considered.
References
[1] R. Kohavi and G. H. John, âWrappers for feature subset selection,â Artif. Intell., vol. 97, no. 1â2, pp. 273â324, 1997.
[2] V. Kumar and S. Minz, âFeature Selection : A literature Review,â Smart Comput. Rev., vol. 4, no. 3, pp. 211â229, 2014.
[3] G. Chandrashekar and F. Sahin, âA survey on feature selection methods,â Comput. Electr. Eng., vol. 40, no. 1, pp. 16â28, 2014.
[4] I. Guyon, A. Elisseeff, and A. M. De, âAn Introduction to Variable and Feature Selection,â J. Mach. Learn. Res., vol. 3, pp. 1157â1182, 2003.