Automated Bioinformatic Pipeline to Identify CRISPR-Cas Systems | AIChE

Automated Bioinformatic Pipeline to Identify CRISPR-Cas Systems

Authors 

Barrangou, R., North Carolina State University
While CRISPR-Cas systems have taken catapulted into the scientific spotlight due to their ability to be repurposed as genome editing tools, these systems natively act as adaptive immune systems in bacteria and archaea. Growth in the field has led CRISPR knowledge, publications, and nomenclature to progress at an overwhelming rate. We have developed an automated pipeline that is able to detect CRISPR repeats as well as cas genes and use these features to determine CRISPR-Cas Type and Subtype to aid CRISPR novices and experts in accelerated characterization of these systems. This pipeline accurately detects CRISPR loci and determines the completeness of systems present in bacterial and archaeal genomes. The pipeline uses cas genes presence and absence, as well as CRISPR repeat information, to accurately assign Types and Subtypes to candidate CRISPR-Cas systems. All six major types and 23 subtypes are detectable by this pipeline; in addition, the proteins in the putative V-U type are also detected. The pipeline can also be used to determine the novelty of proteins identified through homology searches to known databases. Additionally, users can use the pipeline to search for protein domains within sequences identified in the search. This tool will make reproducible, standardized, accessible, transparent and high-throughput analysis methods available to all researchers in and beyond the CRISPR-Cas research community. This is a useful and valuable tool as it enables classification within a complex nomenclature and provides analytical methods in field that is evolved rapidly.