We study optimization for data-driven decision-making when we have observations of the uncertain parameters within the optimization model together with concurrent observations of covariates. Given a new covariate observation, the goal is to choose a decision that minimizes the expected system cost conditioned on this observation. Applications of this framework include (i) shipment planning under uncertainty [2], where historical demands, weather forecasts, and web search results can be used to predict productsâ demands before making production and inventory decisions, (ii) grid scheduling under uncertainty [3], where seasonality, weather, and historical demand data can be used to predict the load and wind energy availability before creating generator schedules, and (iii) portfolio optimization under market uncertainty [4], where stock prices can be predicted using economic indicators and historical stock data before making investment decisions.
We investigate three data-driven frameworks that integrate a machine learning prediction model within a sample average approximation for approximating the solution to this conditional stochastic program [5,6]. Two of the SAA frameworks are new and use out-of-sample residuals of leave-one-out prediction models for scenario generation. The frameworks we investigate are flexible and accommodate parametric, nonparametric, and semiparametric regression techniques. The generality of our framework enables decision-makers to choose the modeling approach that works best for their application. We derive conditions on the data generation process, the prediction model, and the stochastic program under which solutions of these data-driven SAAs are consistent and asymptotically optimal, and also derive convergence rates and finite sample guarantees. Computational experiments on a resource allocation model validate our theoretical results, demonstrate the potential advantages of our data-driven formulations over existing approaches (even when the prediction model is misspecified), and illustrate the benefits of our new data-driven formulations in the limited data regime. Our approach provides a modular framework for using covariate information in stochastic optimization and can be readily generalized to the multi-stage and distributionally robust optimization settings [7].
1. Shapiro, Alexander, Darinka Dentcheva, and Andrzej Ruszczynski. Lectures on stochastic programming: modeling and theory. Society for Industrial and Applied Mathematics, 2014.
2. Bertsimas, Dimitris, and Nathan Kallus. From predictive to prescriptive analytics. Management Science 66.3 (2020): 1025-1044.
3. Donti, Priya L., Brandon Amos, and J. Zico Kolter. Task-based end-to-end model learning in stochastic optimization. arXiv preprint arXiv:1703.04529 (2017).
4. Dou, Xialiang, and Mihai Anitescu. Distributionally robust optimization with correlated data from vector autoregressive processes. Operations Research Letters 47.4 (2019): 294-299.
5. Kannan, Rohit, Güzin Bayraksan, and James R. Luedtke. Data-driven sample average approximation with covariate information. Available on Optimization Online (July 2020).
6. Kannan, Rohit, Güzin Bayraksan, and James R. Luedtke. Heteroscedasticity-aware residuals-based contextual stochastic optimization. arXiv preprint arXiv:2101.03139 (2021).
7. Kannan, Rohit, Güzin Bayraksan, and James R. Luedtke. Residuals-based distributionally robust optimization with covariate information. arXiv preprint arXiv:2012.01088 (2020).