(371a) Quantitative Structure Biodegradation Relationships for Organic Pollutants in Water and Soil | AIChE

(371a) Quantitative Structure Biodegradation Relationships for Organic Pollutants in Water and Soil

Authors 

Espinosa, G. - Presenter, Universitat Rovira i Virgili
Vogel, T. M. - Presenter, Université Claude Bernard Lyon 1
Giralt, F. - Presenter, Universitat Rovira i Virgili


Quantitative Structure Biodegradation Relationships (QSBR) for chemicals that are of environmental concern are relatively scarce and limited in applicability within the chemical domain of interest. Given the large number of chemicals that are of environmental concern there is a need for models to classify chemicals according to their relative biodegradability and to estimate biodegradation rate parameters for new chemicals. In the present study, QSBR models were developed for the chemical degradation rate constants in water and soil by using neural networks, classifiers and other machine learning algorithms. The MITI-1 data for the biodegradation of 672 chemicals in water and soil biodegradation data for 146 chemicals were assessed according to the reported experimental conditions. The structural representation of the chemicals in the data sets was obtained by means of molecular descriptors (structural, topological, geometrical and quantum chemical). Soil matrix related information was also considered in QSBRs for soil. A preliminary screening of all relevant variables for the prediction of biodegradation rate parameters in water and soil was carried out with feature selection methods to choose the most suitable information. The variable screening techniques ranged from machine learning to artificial neural networks, such as neural classifiers and self-organizing maps. QSBR models were developed based on the subsets of features selected by filters and wrapper approaches and modeling by using neural networks and classifiers. For each model, the training and validation datasets were obtained by using the SOM algorithm such that the most representative chemicals from all SOM classes were selected for training. The best degradation rate models for water were obtained for both backpropagation and fuzzy ARTMAP algorithms when data were separated into two overlapping classes of low [1-60%] and high [40-100%] BOD values. At least 80 chemicals were considered for testing for each range. QSBRs with the lowest validation errors within the 15-17% range for both BOD ranges were obtained with the above algorithms and the seven molecular descriptors selected by the Correlation-based Feature Selection (CFS) algorithm. Corresponding leave-one-out cross-validation yielded comparable errors within 16-23%. From the point of view of internal coherence, the most robust QSBR model corresponded to the subset of seven descriptors selected by the Artificial Neural Net Input Gain Measurement Approximation (ANNIGMA), which is based on a weight analysis heuristic algorithm. Qualitative SBR models for soil biodegradation were also developed. In these models chemicals were classified as degradable or persistent, according to a cut-off value of 28 days to differentiate between them. Four different SBR models were obtained with two machine learning methods (M5 regression tree and IBk instance based learning) and with two neural-based algorithms (fuzzy ARTMAP and self-organizing maps) with the best set of descriptors selected using the CFS filter. The test sets used to validate all models for the two degradable and persistent classes of biodegradation in soil were formed by at least 35% of the chemicals involved. The best models for the classification of biodegradation in soil were obtained with the two SOM and fuzzy ARTMAP classifiers, with misclassification errors of 16 and 18.5%, respectively. Several QSBR were also developed for the soil compartment. It was found that for biodegradation half-lives higher than 140 days, i.e., for slow biodegrading chemicals, only molecular information was needed, while for lower half-lives soil-matrix information was also needed. Results concerning these QSBR models will also be presented and discussed.