2015 Synthetic Biology: Engineering, Evolution & Design (SEED)

Introducing Synbis – the Synthetic Biology Information System

Authors

Matthieu.A Bultelle - Presenter, Imperial College

Inaki Sainz de Murieta, Imperial College

1. Introduction

An increasing concern of Synthetic biology is the rational design of pathways. The goal is to build a new pathway or modify an existing one, so that it is endowed with a given dynamic and may operate under a given set of constraints. Typically, the rational design cycle involves an in-silico design and modelling phase on series of CAD software (at part level and/or circuit level [1,2]), followed by construction of the circuit and a testing phase. Depending on the results, the process may be repeated several times until the goal is met.
Such iterative approach has proven its worth and become mainstream in several fields of engineering. In the case of synthetic biology, rational design requires a large catalogue of well characterised chassis, plasmids and bioparts (fundamental parts like promoters or RBS but also common devices such as logic gates, oscillators, pulse generators â?¦). Furthermore, such a catalogue must not only be available online, but it must also be presented in a user- friendly manner and support widely-used data standards to integrate with existing and future CAD tools [3]. Although several well-known repositories exist (the iGEM parts registry [4], JBEI [5] or the virtual parts repository [6] to name a few), we are not aware of any existing repository that meets all those requirements.
Acknowledging this crucial need, the Centre for Synthetic Biology and Innovation at Imperial College (CSynBI) has launched a multi-faceted effort to support the automated characterisation of biological parts â?? focusing in particular on:
1. The support for existing data standards and development of a complementary, robust data standard for the acquisition of experimental data.
2. The construction of a common IT-spine to track and store all data as they are processed and curated.
3. A robust dissemination strategy, enabling public access to high quality biopart information.
In this paper we present the work conducted on points 2 and 3, and introduce SynBIS (the
Synthetic Biology Information System).

2. Biopart Characterisation and SynBIS

SynBIS is the IT spine that has been designed at CSynBI to enable biopart- characterisation efforts on a scale that is difficult-to-impossible for human experimentalists to achieve. It was first developed with constitutive promoters, and now supports the characterisation of other fundamental bioparts such inducible promoters or RBS.

Practically, SynBIS deals with characterisation in term of pipeline â?? characterisation is considered â??a sequence of reliable, validated process aimed at generating high quality experimental data, extracting from them their most important features, and disseminating the resultsâ??. Human intervention is kept to a minimum (curation mainly) and automation is used when possible (especially during data acquisition and data analysis).
Currently, CSynBI pipelines are made of three steps.
1. Data acquisition - Acquisition has been automated as per [7] â?? using a set of validated protocols , whose purpose it is to improve reproducibility and increase the throughput. Plate reader and flow cytometry data are typically acquired.
2. Data Processing - Experimental data are processed to estimate the relationship
between the input and output of the part. Although the information depends on the type of part, processing is modularised by using libraries of models.
3. Dissemination - To make it more usable, all the information on the biopart is compiled into a datasheet, which is then uploaded to dedicated website (with it is own APIs).
Practically, SynBIS logs three types of information as all datasets involved (controls included) proceed down the pipeline:

1. Custodial information

a. Who ? Staff / Algorithm Version in Charge b. When ? Time stamps

c. Where ? Institutions Involved

2. Analysis Information

a. Input /Output of each analysis step

b. Corresponding Metadata

3. Human Curation Information

a. Curator Decision b. Curator reason(s)

SynBIS data-model has been organised around the end-product of the pipeline: the data- sheet of the biopart/device. Datasheets are mainstays of engineering and not new to Synthetic Biology (the datasheet for F2620 [8] is a classic example) and there have been proposals to establish a minimal template [9]. Our work builds upon these earlier works by systematically describing bioparts with a black box approach including:

ï?· Identity (name, sequence, URI)

ï?· Input and Output (including crosstalk)

ï?· Qualitative relations between outputs and inputs

ï?· Quantitative relations between outputs and inputs (raw data and models)

SynBIS places a big emphasis on the experimental context (including the chassis, medium, plasmid, reporter, assay protocol and experiments settings). Since the influence of the context on parts behaviour is poorly understood, SynBIS will host several datasheets (one per context) for the same part.
SynBISâ?? template has been designed so datasheets are human as well as machine readable. The human readable part of the datasheet displays a small subset of all the data collected and generated as part of the characterisation process. The machine readable part makes all these data available to a user.

3. Data Dissemination

SynBIS currentlys host characterisation information on constitutive and inducible promoters:

ï?· 20 promoters from the Anderson Collection

ï?· 2 variants of promoters from the Anderson Collection

ï?· 6 wild-type E-coli promoters

ï?· A handful of inducible promoters (e.g XylF/XylR system)

More promoters are being constructed and characterised. The SynBIS team plans to have several dozens more promoters available by June 2015.
Al the data hosted on SynBIs are available for download and use under CC-BY Creative
Common license and can be accessed in two different manners.
A website has been trialled with selected partners since June 2014 (http://synbis.bg.ic.ac.uk) and is now public access. It can be searched by name, sequence and according to the characteristic function of the biopart (typical requests when it comes to circuit design).
SynBIS also implements an API to enable programmatic access from external applications. This way the power users can not only access individual datasheets (and related data) but also run bulk queries. The API comprises two types of RESTful web services:

ï?· Inbound: input of new curated datasheet information into SynBIS

ï?· Outbound: retrieval of datasheet information.

o XML interface: provides the complete description of a datasheet formatted following the SynBIS database structure.

o SBOL interface: provides the basic information which can be encoded with SBOL-Core (the current stable release).

Whichever the method of access, all the information acquired and generated as part of the characterisation of the biopart is available. SynBIs only uses free and open interchange data standards that are common or we believe will become common in the synthetic biology community.These include human readable formats such as SBOL and XML (serialisation of our datasheets), and SBML for computer models of biological processes. SynBIS also uses a novel (binary) data standard called DICOM-SB that has been developed at Imperial College, London for the purpose of storing raw experimental data in an efficient manner [10].
To integrate all these standards, SynBIS uses a useful property of SBOL version 2.0 [11]. Specifications of SBOL 2.0 (still in development) allow for the listing of the URIs of external resources in the annotation field of bioparts, thus offering an elegant method to link â??the essential information for synthetic DNA sequencesâ?? (SBOL) ,raw experimental data (DICOM- SB), processed data (SBML) and the datasheet (XML).

[1] MacDonald, J. T., Barnes, C., Kitney, R. I., Freemont, P. S., & Stan, G. B. V. (2011). Computational design approaches and tools for synthetic biology.Integrative Biology, 3(2), 97-108. [2] Arpino, J. A., Hancock, E. J., Anderson, J., Barahona, M., Stan, G. B. V., Papachristodoulou, A.,

& Polizzi, K. (2013). Tuning the dials of synthetic biology. Microbiology, 159(Pt 7), 1236-1253.

[3] Kitney, R., & Freemont, P. (2012). Synthetic biologyâ??the state of play. FEBS letters, 586(15),

2029-2036.

[4] http://parts.igem.org/Main_Page

[5] https://public-registry.jbei.org/

[6] http://sbol.ncl.ac.uk:8081/

[7] Hirst CD, Ainsworth C, Kelwick RJR, Freemont PS, Kitney RI, Baldwin GS. An automated platform for the characterisation of regulatory biological parts in synthetic biology. In Submission. 2014.

[8] Canton, B., Labno, A., & Endy, D. (2008). Refinement and standardization of synthetic biological parts and devices. Nature biotechnology, 26(7), 787-793.

[9] Arkin, A. (2008). Setting the standard in synthetic biology. Nature biotechnology, 26(7), 771-773 [10] A DICOM EXTENSION SUPPORTING THE DATA ACQUISITION PROCESS IN SYNTHETIC BIOLOGY

[11] Roehner, N., Oberortner, E., Pocock, M., Beal, J., Clancy, K., Madsen, C., ... & Myers, C. J.

(2014). Proposed data model for the next version of the Synthetic Biology Open Language. ACS

synthetic biology.