(185a) Data Fusion and Feature Selection for Process Monitoring at Hanford

Conference

AIChE Annual Meeting

Year

2022

Proceeding

2022 Annual Meeting

Group

Computing and Systems Technology Division

Session

Data Science/Analytics for Process Applications

Time

Monday, November 14, 2022 - 3:30pm to 3:49pm

Authors

Crouse, S. - Presenter

Gurprasad, R., Georgia Institute of Technology

Grover, M., Georgia Tech

Rousseau, R., Georgia Institute of Technology

Kocevska, S., Georgia Institute of Technology

In this work, Raman and ATR-FTIR spectroscopies are combined for more accurate measurements of nuclear waste simulants. This combination of information is referred to as data fusion. In addition to the combination of these spectroscopies, feature selection is applied to improve monitoring accuracy and resilience to â€œnon-targetâ€ species that are not included in model training. Combining feature selection and data fusion allowed for improvement of prediction error by a factor of 2.8 over the use of a single sensor alone. These results show how process monitoring can be improved by case-specific data processing strategies. The application in this work is the complex waste streams present in nuclear waste remediation, particularly at the Hanford Site in Washington State.

The effective utilization of data has become more important as processes are providing more information with Industry 4.0 initiatives [1]. One such application is real-time monitoring in a vitrification process plant in Hanford in Washington State. This site removes water and radionuclides from radioactive waste, before vitrifying it for long term storage. The process will have inconsistent feed streams, leading to the need for real-time process monitoring. However, a single in-line instrument has limited ability to measure the 25 chemical constituents and 46 radionuclides expected in the process [2]. A preprocessing strategy is necessary that is able to combine information from multiple sensors and distinguish components in the complex spectra.

There are two on-line sensors investigated in this study. Raman and Attenuated Total Reflectance â€“ Fourier Transform Infrared (ATR-FTIR) spectroscopy. These molecular measurement techniques provide a high dimensional space of 1472 wavenumbers to monitor the process for decision-making. In addition to the high-dimensional space, there are spectral components that may appear under actual process conditions that were not included in model training. The high dimensionality and possible process noise motivate the use of feature selection to reduce dimensionality prior to model input.

The feature selection method used is a general forward selection wrapper method [3]. A wrapper method selects important features using the quantification model, in this case Partial Least Squares Regression (PLSR), to determine the most important subset of features. The general forward selection method used in this study has two distinct steps. Two steps are necessitated by the high dimensionality of the problem. Testing every possible subset of features is an NP-hard problem resulting in 2¹⁴⁷²possible combinations [3]. Therefore, a feature ranking will be established based on heuristics and then the optimum number of these ordered features will be determined. This reduces the number of possible feature subsets from 2¹⁴⁷²to 1472.

The first step is ranking the features (wavenumbers) based on spectral intensity in the training set. This follows the intuition that (after baseline correction) features with high intensities correspond to useful spectral information, based on signal-to-noise ratio arguments. The second step is determining how many of these ordered features give optimum performance on test spectra via cross validation. Root Mean Squared Error was used as the primary error metric for evaluating performance. In summary, the features are ranked on the training data and then the optimum number of features are chosen on test spectra with species not included in model training.

A similar established wrapper method, the Successive Projections Algorithm (SPA), was compared for performance and determined to underperform the more general forward selection method. SPAâ€™s worse performance is attributed to its performance on the highly collinear information present in the spectra. As part of the SPA algorithm, mutual information from the already chosen features is subtracted. Since wavenumbers in spectral peaks share much mutual information with adjacent wavenumbers, subtraction of mutual information quickly leads the algorithm to select wavenumbers based on noise rather than information (noise is the primary information left after several iterations of the SPA algorithm).

Data fusion is implemented to combine the information from the Raman and ATR-FTIR sensors after the feature selection is applied. There are multiple levels of data fusion, as shown by Borras et al [4]. In this work, data-level fusion is used through concatenation. Concatenation is applied because it includes all information from both instruments (after feature selection is applied). Standard scaling, in addition to dimensional reduction in the model (PLSR), removes the physical differences between the spectra. This allows the information to be input simultaneously into a single model.

To conduct the study, simulants of nuclear waste mixtures are used. These simulants consist of water as a solvent with seven dissolved sodium salts. These salts are sodium: nitrate, nitrite, sulfate, carbonate, oxalate, phosphate, and acetate. Of the seven salts, four (nitrate, nitrite, sulfate, and carbonate) are considered â€œtarget speciesâ€ and are included in the training dataset. The â€œnon-targetâ€ species are the remaining three salts (oxalate, phosphate, and acetate). The non-target salts are used to simulate unanticipated feed conditions expected at the Hanford vitrification process. The quantification model used is Partial Least Squares Regression. This model is chosen because of its documented success for quantifying spectra [5].

Our results show that feature selection combined with data fusion of Raman and ATR-FTIR provides more accurate analysis of nuclear waste mixtures than either instrument alone. We are able to reduce mean percent errors from 43.2% (Raman) and 15.8% (ATR-FTIR) to 5.6% error for a method utilizing forward selection and data fusion. This improvement is due to the processing strategies used on the data since the data since the same measurements were used.

This work has the potential to improve real-time monitoring at the Hanford Site in Washington State. Better data processing strategies can improve process monitoring and decrease process downtime. The approach in this work can be applied to other processes outside the domain of nuclear waste treatment. Application-specific data processing strategies are often necessitated by the challenges faced in modern processing plants.

References:

[1] I. A. Udugama et al., â€œThe Role of Big Data in Industrial (Bio)chemical Process Operations,â€ Ind. Eng. Chem. Res., vol. 59, no. 34, pp. 15283â€“15297, 2020, doi: 10.1021/acs.iecr.0c01872.

[2] S. Kocevska, G. M. Maggioni, R. W. Rousseau, and M. A. Grover, â€œSpectroscopic Quantification of Target Species in a Complex Mixture Using Blind Source Separation and Partial Least-Squares Regression: A Case Study on Hanford Waste,â€ Ind. Eng. Chem. Res., vol. 60, no. 27, pp. 9885â€“9896, 2021, doi: 10.1021/acs.iecr.1c01387.

[3] G. Chandrashekar and F. Sahin, â€œA survey on feature selection methods,â€ Comput. Electr. Eng., vol. 40, no. 1, pp. 16â€“28, 2014, doi: 10.1016/j.compeleceng.2013.11.024.

[4] E. Borras, J. Ferre, R. Boque, M. Mestres, L. Acena, and O. Busto, â€œData fusion methodologies for food and beverage authentication and quality assessment - A review,â€ Anal. Chim. Acta, vol. 891, 2015, doi: 10.1016/j.aca.2015.04.042.

[5] P. Tse, J. Shafer, S. A. Bryan, and A. M. Lines, â€œQuantification of Raman-Interfering Polyoxoanions for Process Analysis: Comparison of Different Chemometric Models and a Demonstration on Real Hanford Waste,â€ Environ. Sci. Technol., 2021, doi: 10.1021/acs.est.1c02512.

Topics

Sensors

Process Automation & Control

Nuclear

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.