Interpretable Modeling of Genotype-Phenotype Landscapes without Sacrificing Predictive Power
Synthetic Biology Engineering Evolution Design SEED
2021
2021 Synthetic Biology: Engineering, Evolution & Design (SEED)
Poster Session
Poster Presenters - Accepted
Black-box neural networks are currently the dominant choice of GPL models due to their unsurpassed ability to generate accurate out-of-sample predictions. These models suffer, however, from their inherent inability to explain their predictions. While methods for post-hoc explanation of neural network predictions exist, they can only approximate the trained model, and the accuracy of those approximations is often difficult to assess. Despite these issues, neural networks remain popular due to the assumption that there is a necessary trade-off between a model's predictive accuracy and its interpretability.
As an alternative, we developed an approach to modeling GPLs that is inherently interpretable, called LANTERN. LANTERN learns a low-dimensional latent phenotype space where mutations combine additively. The latent phenotype is then transformed to observed phenotype measurements through a smooth, non-linear function. This approach ensures that the predictions can be easily decomposed into interpretable components. Despite this simplicity, LANTERN provides predictive accuracy equal to or better than neural network methods when applied to GPL data.
LANTERN also provides novel metrics of GPL structure, including the empirical dimensionality, local additivity, and phenotypic robustness. Notably, these are learned de-novo from data, without additional domain specific knowledge. Overall, LANTERN demonstrates that there is no necessary tradeoff between predictive accuracy and interpretability for GPL data.