(86a) A Data Mining Framework for Collecting Chemical-Centric Data for End-of-Life Flow Inventory | AIChE

(86a) A Data Mining Framework for Collecting Chemical-Centric Data for End-of-Life Flow Inventory

Authors 

Ruiz-Mercado, G. - Presenter, U.S. Environmental Protection Agency
Hernandez-Betancur, J., Universidad De Salamanca
Martin, M., University of Salamanca
Tracking chemical flows and collecting life cycle inventories (LCI) are crucial steps for identifying potential exposure scenarios at the chemical end-of-life (EoL) stage. Nonetheless, addressing these tasks is time-consuming and challenging. Data-driven modeling is considered a powerful tool to streamline the identification of exposure scenarios, potential environmental releases, and material transfers. However, the first step is to build a data pipeline for collecting and preparing the data to be ingested into a data-driven model for training and retraining. This work presents a data mining framework to extract and transform data from publicly accessible, siloed, and multi-country database systems, whose applicability domain is for the LCI of chemical EoL off-site transfers. The framework has requirements to integrate database systems: (i) these are available in English, (ii) whether they are chemical-centric or focused on individual chemicals instead of total transferred amounts, and (iii) whether their data granularity is enough to describe the entities involved in a transfer. Thus, the collected data describes the generator, chemical, and type of EoL activity (e.g., surface impoundment) involved in transferring a chemical contained in a waste stream to an off-site location for EoL management. An exploratory data analysis shows the implications and limitations of the data for being used by data-driven models like classifiers, e.g., to predict potential EoL chemical exposure scenarios. The data mining pipeline can provide datasets at an annual rate and deal with the decay of the data-driven model performance over time due to changes in the statistical distribution of independent variables (e.g., generator industry sector). Moreover, the data-mining framework deals with changes in the relationship between the independent variables and the target one (e.g., a potential EoL activity and a chemical of concern). Also, the framework can help to deal with regulatory and environmental law differences across geographical locations and over the years by incorporating LCI data from different countries and capturing insights from the data about reporting environmental criteria from one year to the next.