(135b) Sample Size Determination for Metamodel Building in Automated Machine Learning Pipelines By an Inclusive Feedback Algorithm
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Topical Conference: Next-Gen Manufacturing
Applied Artificial Intelligence, Big Data, and Data Analytics Methods for Next-Gen Manufacturing Efficiency I
Tuesday, November 7, 2023 - 8:21am to 8:42am
Automated Machine Learning aims at compiling all the necessary decisions in a data-driven, objective, and automated way to produce an algorithm that makes the practice of model building more efficient for non-experts (Thornton et al., 2013). For this audience, there are many popular freely available AutoML algorithms, including Auto-WEKA (Kotthoff et al., 2016, 2019; Thornton et al., 2013), AutoGluon-Tabular (Erickson et al., 2020), Auto-sklearn (Feurer et al., 2015), H2O AutoML (LeDell & Poirier, 2020), Sumo (Gorissen et al., 2010), Hyperopt-sklearn (Komer et al., 2014), TPOT (Olson et al., 2016), Auto-Keras (Jin et al., 2019), to name a few. A high-level overview of the proposed heuristic inclusive feedback iterative AutoML pipeline (HIFIAPP) for regression of process data is shown in Algorithm 1. All these ideas were embedded in an Automated Machine Learning pipeline designed to systematically produce simple structured metamodels, using a minimum number of samples in an amenable amount of time. The application of the algorithm to representative test cases revealed the effectiveness of the proposed strategy to promote the construction of surrogate models with significant predictive capabilities.
References
Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy, P., Li, M., & Smola, A. (2020). AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data.
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and Robust Automated Machine Learning. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 28 (NIPS 2015) (Vol. 28). Curran Associates, Inc.
Gorissen, D., Couckuyt, I., Demeester, P., Dhaene, T., & Crombecq, K. (2010). A Surrogate Modeling and Adaptative Sampling Toolbox for Computer Based Design. Journal of Machine Learning Research, 11, 2051â2055.
Jin, H., Song, Q., & Hu, X. (2019). Auto-Keras: An Efficient Neural Architecture Search System. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1946â1956. https://doi.org/10.1145/3292500.3330648
Komer, B., Bergstra, J., & Eliasmith, C. (2014). Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn. 32â37. https://doi.org/10.25080/Majora-14bd3278-006
Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F., & Leyton-Brown, K. (2016). Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. Journal of Machine Learning Research, 17, 1â5.
Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F., & Leyton-Brown, K. (2019). Auto-WEKA: Automatic Model Selection and Hyperparameter Optimization in WEKA (pp. 81â95). https://doi.org/10.1007/978-3-030-05318-5_4
LeDell, E., & Poirier, S. (2020). H2O AutoML: Scalable Automatic Machine Learning. In F. Hutter, J. Vanschoren, M. Lindauer, C. Weill, K. Eggensperger, & M. Feurer (Eds.), 7th ICML Workshop on Automated Machine Learning.
Olson, R. S., Bartley, N., Urbanowicz, R. J., & Moore, J. H. (2016). Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science. Proceedings of the Genetic and Evolutionary Computation Conference 2016, 485â492. https://doi.org/10.1145/2908812.2908918
Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013). Auto-WEKA. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 847â855. https://doi.org/10.1145/2487575.2487629