(59n) A Graph Attention Network Based Approach for Interpretable and Domain-Aware Modeling of a Wellhead Water Treatment System | AIChE

(59n) A Graph Attention Network Based Approach for Interpretable and Domain-Aware Modeling of a Wellhead Water Treatment System

Authors 

Rural communities in agricultural regions across the United States are increasingly confronted with increasing salinity and nitrate levels of their local potable groundwater sources. The water sources of many communities exceed the maximum nitrate contaminant level, and their water salinity is well above the recommended level for drinking. In order to provide safe potable water to these communities, membrane-based wellhead water treatment system was developed and deployed in multiple communities. However, given the remote locations, such wellhead water treatment systems must be autonomous and adapt to handle intermittent system operations due to the fluctuating water use patterns and unavailability of continuous manual labor. Since the operation of such water treatment systems is intermittent (given the fluctuating hourly and daily water usage, and the finite local water storage volume), it is critical to develop a system performance model that is suitable for model-based control, performance forecasting, fault detection, and determination of causal relationships among process attributes. To accomplish the above, an advanced ensemble machine learning approach, based on graph neural networks with attention mechanism (GATs), was developed to describe the intermittent operational profiles of a wellhead water treatment system deployed in a small remote disadvantaged community in California. System monitoring and data acquisition was accomplished via a cloud-based infrastructure providing time-series data from 22 system attributes (e.g., pressures, flow rates, temperature, permeate conductivity, and nitrate concentration). The ensemble model consisted of five graph attention networks modeling the five different system operational modes: production, startup, shutdown, system permeate flushing, and feed flushing. Here we note that graph attention networks are well suited for the above operational modes given their ability to incorporate the causal relationships among system parameters and performance attributes that allows performance forecasting. The graph attention networks corresponding to each of these different operational modes were trained using one year of operational time-series data utilizing observations over past ~50 seconds to predict system parameters at the next timestep. The ensemble model was used to evaluate system performance, for the multimode operation, with respect to nitrate and salt passage, as well as membrane permeability. Model attributes and their complex dynamic and nonlinear causal relationships were assessed via the inherent graphical representation capability of GATs where the strengths of attribute causal relationships are represented by the learned weights of the connections within the graph network. GAT models for the different operational modes performed well with Root-Mean-Squared-Error (RMSE expressed as percentage) of ~8-9% for both nitrate passage and salt passage. To incorporate domain knowledge in the modeling approach, and assess the advantage thereof, two modeling approaches were evaluated. In the first approach, the underlying graphical structure of the graph attention networks was defined a priori based on domain knowledge regarding the mechanism of transport in RO membrane elements. In the second approach, the graphical structure was learned through training based on nominal data. The ensemble approach that included graph attention networks with the a priori domain knowledge defined graphical structure, yielded a lower RMSE for predicting system performance, indicating the advantage of informing the data-driven approach using domain knowledge. Furthermore, causal relationships of among attributes were identified and validated by visualizing the learned weights within the network structure which can be interpreted as the magnitude of the variation of each process variable with respect to other connected variables. The results of the study demonstrated multitude of benefits including: (a) forecasting performance of a real RO system with respect to nitrate passage, salt passage, and membrane permeability, with causal relationships informed by domain knowledge; (b) interpretable modeling approach that identifies the causal relationships between various dynamic process variables and the target outcomes; (c) a robust and explainable modeling approach via graphical network structures to detect and identify faulty variable data; and (d) providing a foundation for model-based control for the treatment system to meet regulatory requirements for safe drinking water.