A Bacterial Sensor Taxonomy across Earth Ecosystems for Machine Learning Applications | AIChE

A Bacterial Sensor Taxonomy across Earth Ecosystems for Machine Learning Applications

Authors 

Dehal, P., Lawrence Berkeley National Laboratory
Arkin, A. P., University of California, Berkeley
Microbial communities have evolved to colonize all ecosystems of the planet, from the deep sea to the human gut. Microbes survive by sensing, responding, and adapting to immediate environmental cues. This process is driven by signal transduction proteins such as histidine kinases, which use their sensing domains to bind or otherwise detect environmental cues and ‘transduce’ signals to adjust internal processes. We hypothesized that an ecosystem’s unique stimuli leave a sensor ‘fingerprint,’ able to identify and shed insight on ecosystem conditions. To test this, we collected 20,712 publicly available metagenomes from Host-associated, Environmental, and Engineered ecosystems across the globe. We extracted and clustered the collection’s nearly 18M unique sensory-domains into 113,712 similar groupings with MMseqs2. We built gradient-boosted decision tree machine learning models and found we can classify the ecosystem type (accuracy: 87%) and predict the levels of different physical parameters (R2 score: 83%) using the sensor cluster abundance as features. Feature importance enables identification of the most predictive sensors to differentiate between ecosystems which can lead to mechanistic interpretations. To demonstrate this, a machine-learning model was trained to predict patient disease state and used to identify domains related to oxygen sensing present in a healthy gut but missing in patients with abnormal conditions. Moreover, since 98.7% of identified sensor domains are uncharacterized, importance ranking can be used to prioritize sensors to determine what ecosystem function they may sense. Further, these new predictive sensors can function as targets for novel sensor engineering with applications in biotechnology, ecosystem maintenance, and medicine.