(314d) Antibody Affinity and Specificity Co-Optimization Via Machine Learning
AIChE Annual Meeting
2022
2022 Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Computational, Structure, Biophysical Protein Engineering
Tuesday, November 15, 2022 - 1:24pm to 1:42pm
To better understand antibody sequence-property relationships, high-throughput screening of antibody libraries is commonly combined with next-generation sequencing (NGS). In applications of this technique, large repertoires of antibody mutants are screened based on their ability to bind to a specific reagent, labeling them as either positive or negative. Each bin of mutants is subsequently sequenced, providing the amino acid string of each antibody in each bin. Though this data represents a monumental step towards revealing sequence-property relationships, its use for rational therapeutic antibody design remains substantially limited. This is because the binary deep sequencing data obscures intra-class variation and does not provide information regarding the continuous nature of binding. Continuous measurements are necessary to tune properties that exhibit strict tradeoffs such as affinity and specificity.
In this work, we introduce a method that enables the co-optimization of antibody affinity and specificity by applying protein sequence feature-extraction methods and supervised dimensionality reduction methods for the extraction of continuous predictions of binding from binary deep sequencing data. We focused on the optimization of emibetuzumab, a stage 2 clinical anti-cancer antibody with high on-target (affinity) and off-target binding. We combined NGS and high-throughput screens to generate datasets consisting of thousands of antibody sequences with CDR mutations labeled for both affinity and specificity. We next applied a variety of machine learning methods to extract features from each of our labeled sequences. Using these feature sets, we were able to use linear discriminant analysis (LDA) or more complex neural networks (NNs) to further characterize the data. We found that using both LDA and NN classification methods enabled extraction of continuum projections of our antibody features. These projections correlated strongly with low-throughput, continuous measurements from selected clones in our library. Thus, by applying learned projections to each of the sequences in our library, we were able to estimate the continuous affinity and specificity of each clone and characterize the tradeoff between these two properties. Our analysis also facilitates multi-objective (Pareto) optimization of affinity and specificity.
We next identified maximally optimized sequences along the Pareto frontier, which comprises the set of antibodies that cannot improve specificity or affinity without sacrificing the other property. We predicted new clones containing in- and out-of-library mutations that lie at or beyond the Pareto frontier in order to improve both affinity and specificity at the same time. Using this method, over half of the antibodies we generated exhibited improved affinity and specificity compared to wild-type emibetuzumab, some of which went beyond the Pareto frontier in the original library.
Our combination of feature extraction and LDA- or NN-based dimensionality reduction provides two key advantages in antibody design. First, by characterizing tradeoffs between biophysical properties at scale, we can rationally select the combination of properties most likely to exhibit clinical efficacy, as opposed to optimizing one property at the expense of others. Secondly, some of our feature sets enable extrapolation to novel mutational space. We expect this methodology will facilitate optimization of therapeutic antibody biophysical properties and accelerate rational therapeutic antibody development.