(566f) Ontospecies: A Dynamic Knowledge Graph for the Representation of Chemical Species.

Conference

AIChE Annual Meeting

Year

2023

Proceeding

2023 AIChE Annual Meeting

Group

Computational Molecular Science and Engineering Forum

Session

Software Engineering in and for the Molecular Sciences

Time

Tuesday, November 7, 2023 - 9:12am to 9:25am

Authors

Bai, J. - Presenter

Pascazio, L., University of Cambridge

Rihm, S.

Akroyd, J., University of Cambridge

Mosbach, S., University of Cambridge

Kraft, M., Uiv of Cambridge

Data is beginning to play a greater role in transforming the chemical engineering research and practice. The size of chemistry databases, their complexity, and their number have increased continuously over time. One of the most comprehensive general public database is PubChem. It hosts information on more than 60 million unique chemical structures and it serves as a key chemical information resource for researchers in many biomedical science areas, including cheminformatics, chemical biology, and medicinal chemistry. PubChem provides various tools that fulfil criteria for simple and effective searching. However, these tools are insufficient if the search needs to be designed to fulfil complex criteria as well as if new information needs to be derived from existing data.

Since the landmark publication by Berners-Lee et al., the semantic web field has envisioned the next generation of the web in both a human- and machine-readable format and in recent years it is emerging as an increasingly important approach for better scientific data sharing and faster data processing using computers.

The World Avatar (TWA) project uses the semantic web technologies to create a digital â€™avatarâ€™ of the real world. The digital world is composed of a dynamic knowledge graph that contains concepts and data that describe the world, and an ecosystem of autonomous computational agents that simulate the behaviour of the world and that continuously update the concepts and data. A knowledge graph (KG) is a network of data expressed as a directed graph, where the nodes of the graph are concepts or their instances (data items) and the edges of the graph are links between related concepts or instances. This provides a powerful means to host, query and traverse data, and to find and retrieve related information. The autonomous computational agents are the key aspect of the dynamic nature of the KG. They continuously and independently act on the KG performing various tasks with the aim of producing a self-growing, self-updating, and self-improving ecosystem.

To create a digital representation of the world that bridges the molecular-scale chemistry level to real world macroscale phenomena and enables cross-domain applications, it is crucial to have a rich general chemistry domain. Exposing the PubChem data to semantic web services may help in this regard. Due to the difficulties on dealing with data from different sources and mostly collected in the form of strings, the current databases that translate data from PubChem in relational databases do not include all the available information that can be accessed in the web. Properties like boiling point, melting point, density or solubility as well as spectral information on chemical species are currently not available in any relational database.

The purpose of this work was to develop an ontology, OntoSpecies, that describes chemical species and their properties and that aims to serve as core of the chemistry domain of TWA KG and at the same time address some of the limitations of previous chemistry relational databases. Specifically, the resulting ontology:

- Provides knowledge on general chemistry concepts related to chemical species through the integration of PubChem data on compounds into the database using a software agents. Concepts that are currently not exported in any relational database are also included in the ontology.

- Gives access to the dataset through a SPARQL endpoint. This will remove the inherent limitations of using the web-based PubChem resource (such as inability to construct complicated queries using the available web-based interfaces) by allowing a researcher to use readily available semantic technologies to query and analyze PubChem data on local computing resources.

- Uses a dynamic knowledge-graph-based approach to have a self-growing database that not only integrates data from different sources but also creates and infers knowledge through the uses of software agents.

It is anticipated that our approach will play a key role in the next generation of chemical informatics. The ontological format permits advanced queries, and easy data analysis and visualization. This can be used to compare chemical properties of similar compounds, find compounds with required characteristics as well as automate laborious data gathering from researcher. We show how tasks like the identification of species in unknown mixture based on NMR spectrum, the selection of suitable solvents based on multiple criteria or the investigation of trends in chemical properties can be addressed using SPARQL queries in combination with the use of software agents to postprocess the information obtained as query result. We also show how the ontological format is beneficial to maintain and enrich the data set, as well as to check the consistency and accuracy of the data. Finally, the link between OntoSpecies and other ontologies in TWA is discussed in the context of laboratory automation and cross-domain applications in TWA ecosystem.

Topics

Chemical Reaction Engineering

Computational Molecular Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2024 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: October 2024

CEP: September 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(566f) Ontospecies: A Dynamic Knowledge Graph for the Representation of Chemical Species.

AIChE Annual Meeting

2023

2023 AIChE Annual Meeting

Computational Molecular Science and Engineering Forum

Software Engineering in and for the Molecular Sciences

Tuesday, November 7, 2023 - 9:12am to 9:25am

Authors

Topics

More Conference Links

Visit Orlando

Universal Studios Offer

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams

Code of Conduct

Beware of Hotel and Attendee-list Scams