(59m) Data Embedding and Hybrid Modeling for Industrial Fluid Catalytic Cracking | AIChE

(59m) Data Embedding and Hybrid Modeling for Industrial Fluid Catalytic Cracking

Authors 

Kokosis, A. - Presenter, National Technical University of Athens
Blitas, D., National Technical University of Athens/ Industrial Process Systems Engineering (IPSEN

Fluid catalytic cracking (FCC) is the most economically important refinery operation currently in use. The process contributes about 35% of total gasoline production, as well as other valuable refined products. The process itself is markedly complex. FCC units make use of high temperatures, as well as specialized catalysts which continuously circulate, to break down complex hydrocarbons into simpler compounds. To tackle the complexity of the FCC process, various approaches to modeling have been proposed. The earliest papers in FCC modeling relied on lumped kinetics and is widely used to this day. In lumped kinetic (LK) models, feed and product components are grouped into pseudo-components by means of clusters. Criteria for sorting feed or product components into lumps (boiling point range, chemical structure, etc.) may vary. Feed lumps are connected to product lumps by arrows, which represent cracking reactions and set up the basis for equation-based analysis. Models are derived from first principles and simplified with the use of empirical assumptions. Other than simplifying the process, LK models offer important means to both understand and manipulate the FCC process. Further model reduction is always possible but at the peril to compromise understanding of the process and model accuracy. Data-driven models are able to abstain from first principles as they apply systems analysis to infer input-output relationships using multi-parametric models that range from simple linear models to deep learning developments trained out of data from daily operations. Such models can be used to monitor the FCC performance in design, off-line and on-line control applications. They are less demanding computationally, can make full use of plant data and are reported to monitor performance quite accurately. However, they fail to offer proper insights for the process and are prone to significant errors whenever in use outside the particular remit of data space they have been trained for.

There is an apparent motivation to upgrade first-principle-based models towards hybrid models that make systematic use of operational data, thus combining merits of available approaches currently in use. Rather than directly inferring input-output relationships, data embedding may infer model parameters used in the first-principle-based models. Training could alternatively use a superset of data comprised by curated simulations and operational data. Data include then: (a) sets of curated (simulation) data from constitutional equations of first-principle based models in the FCC; and (b) operational data from the process. The approach addresses the importance of merging data from two different resources (plant data, curated data) assessing the potential of the merged approach to improve the accuracy of the LK model as well as the ability of the data-based model to generalize outside the remit of the training set used. The methodology involves a staged approach that is applied as follows: (i) a curated model is developed based on a LK method (deterministic model) to produce datasets ds1, and predict the FCC output, (ii) data streams ds2 from the real process are increasingly merged and hybridize ds1 to predict (neural network) new FCC outputs, and (iii) the process continues measuring the level of improvement achieved as each stage by reporting improvements in the accuracy of predictions for each one of the output streams of the system. Aspen HYSYS was used to produce an LK model; the ANN was developed in Python: ds1 consists of 55 rows of data, generated by use of the case study function in HYSYS. ds2 consists of 70 rows of operational data from the HELPE FCC unit at the Aspropyrgos refinery, of which 15 were randomly selected to be used as the test set and 55 as the initial training set. Data from ds2 was introduced in increments of 10% to form the hybridized ds1, which ultimately consisted of equal parts real and curated data. The ANN underwent 100 instances of training at each step, of 1000 epochs each. The best results achieved through training at each 10% step were used as the basis for analysis. Data embedding significantly improved the prediction accuracy of the initial model (Fig.1). In the case of naptha and LPG average prediction errors were markedly reduced. Good predictions were already achieved at 10% hybridization for certain products and rapidly improved at 20% and beyond for the overall model.