(117e) Reinforcement Learning-Based End-Effect Mitigation in Solar-Wind Energy Market System Using Value Function Approximation
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
10C: Planning and Operation of Energy Systems
Monday, October 28, 2024 - 1:54pm to 2:15pm
The operation of solar-wind energy market system faces significant challenges due to several sources of uncertainty. One major challenge stems from accommodating renewable sources, which are inherently intermittent and uncertain, leading to irregular and unpredictable electricity generation [1-2]. Additionally, electricity prices exhibit uncertain properties. Decision-makers are required to possess accurate forecasts of these uncertainties for better management. Another challenge arises from the time gap between decision-making and real operation: the decisions should be made at the time when the optimization horizon has not come yet. In the European day-ahead electricity markets, decisions on the electricity offering/bidding for the day of real operation must be made around noon of the previous day [3]. This means that the day-ahead decision has to be made under consideration of a sufficiently large number of possible scenarios of the realization of uncertainties and operation decisions.
To account for the stochastic nature of decision-making in solar-wind energy market system, it is typically addressed within the framework of stochastic programming. However, the optimization horizon in a general class of stochastic programming becomes limited to a short and finite length due to the rapidly increasing problem size. This limitation on the horizon length leads to the âend-effect,â where the energy storage is drained empty at the end of each optimization horizon. Typically, the operation of energy systems involves sequential decision-making, with each decision-making process having an optimization horizon of a few days, for example, one day. From a myopic perspective, discharging the stored electricity in the ESS and selling it to the market until the end of the optimization horizon generally yields higher profits. Consequently, the ESS is often drained empty at the end of each optimization horizon, negatively impacting the storageâs lifetime and the systemâs long-term profitability. This phenomenon is a common issue in decision-making problems concerning the operation of energy systems incorporating ESS. To mitigate the end-effect, several realistic valuation methods were proposed for the terminal energy in the storage at the end of each optimization horizon [3].
In this study, we mitigate the end-effect through a more rigorous method using reinforcement learning (RL) for value function approximation, which can be integrated with the two-stage stochastic programming (2SSP) formulation proposed in Ikonen et al. [3]. By combining 2SSP with Markov Decision Process through the value of terminal stored energy levels in the storage at the end of each day, we enable recursive updating of the value function using observations sampled from the 2SSP decision-making with a scenario generation model [4]. Our framework approximates the value function in a linear form with univariate basis functions parameterized by state variables, a widely used approximation strategy in energy system problems [4]. We iteratively update the functionâs coefficients using a policy iteration algorithm to evaluate the operation strategy based on the most current value function approximation and update the value function approximator with the minimization of the error between the previous prediction and the current observation. Through this approach, we can train the RL agent to learn the valuation of stored electricity and estimate its longer-term value more reasonably, thereby mitigating the end-effect in a more cost-effective and reliable manner. To demonstrate the performance improvement, we compare our proposed approach with conventional methods from previous studies and the benchmark method based on forecasted electricity prices in terms of long-term profit.
References
[1] M. T. Kelley, R. Baldick, and M. Baldea, âDemand response scheduling under uncertainty: Chance-constrained framework and application to an air separation unit,â AIChE Journal, vol. 66, no. 9, pp. e16273, 2020.
[2] D. Han and J. H. Lee, âTwo-stage stochastic programming formulation for optimal design and operation of multi-microgrid system using data-based modeling of renewable energy sources,â Applied Energy, vol. 291, pp. 116830, 2021.
[3] T. J. Ikonen, D. Han, J. H. Lee, and I. Harjunkoski, âStochastic programming of energy system operations considering terminal energy storage levels,â Computers and Chemical Engineering, vol. 179, 108449, 2023.
[4] J. Shin, J. H. Lee, and M. J. Realff, âOperational planning and optimal sizing of microgrid considering multi-scale wind uncertainty,â Applied Energy, vol. 195, pp. 616-633, 2017.