(119a) Integrating Chemistry Knowledge in Large Language Models Via Prompt Engineering | AIChE

(119a) Integrating Chemistry Knowledge in Large Language Models Via Prompt Engineering

Authors 

Yin, H., Tsinghua University
Luo, Z., National University of Singapore
Wang, X., Tsinghua University
Our work presents a study on the integration of domain-specific knowledge in prompt engineering to enhance the performance of large language models (LLMs) in scientific domains. The proposed domain-knowledge embedded prompt engineering method outperforms traditional prompt engineering strategies on various metrics, including capability, accuracy, F1 score, and hallucination drop. The effectiveness of the method is demonstrated through case studies on complex materials including the MacMillan catalyst, paclitaxel, and lithium cobalt oxide. The results suggest that domain-knowledge prompts can guide LLMs to generate more accurate and relevant responses, highlighting the potential of LLMs as powerful tools for scientific discovery and innovation when equipped with domain-specific prompts. The study also discusses limitations and future directions for domain-specific prompt engineering development.

Introduction and Background: The rapid integration of artificial intelligence (AI) into natural sciences has significantly propelled its integration into natural science, specifically biology, chemistry and material science. Traditional AI applications in science focused on properties predictions, but faced limitations with known molecules or materials. The advent of LLMs, capable of zero-shot reasoning and incorporating domain knowledge, presents a new avenue for scientific discovery [1]. Yet, the quality of prompts significantly influences LLM outputs, and a significant limitation of these prompt engineering methods is that they do not incorporate domain expertise as guidance for problem-solving, considerably restricting the capabilities of LLMs in numerous domain-specific tasks. This study addresses the gap in LLM’s applications in tasks for chemistry, biology and materials science by introducing domain-specific prompt engineering framework to enhance LLM applicability in these fields.

Methodology: In task construction process, we collect and curate a comprehensive dataset of 1280 questions and corresponding solutions for the evaluation of LLM’s capability. The tasks involve three domains: organic small molecules, enzymes, and crystal materials, which hold significant relevance in academic research and practical applications . In conjunction with this dataset, a common and open-source LLM plugged-in automatic benchmarking scheme is also developed to utilize several metrics including capability, accuracy, F1 score, and hallucination drop for LLM’s domain-specific performance evaluation.

Formulating domain-specific scientific prediction as LLM question answering tasks, we propose a domain-knowledge embedded prompt engineering strategy that draws on both heuristics of generic prompt engineering methods like few-shot prompting [2], chain-of-thought prompting [3] and expert prompting [4], as well as domain knowledge incorporation, which essentially involves integrating the thought processes of chemistry/biology experts to offer precise background knowledge and exemplify accurate human reasoning to LLM. The prompting scheme takes the form of multi-expert mixture. Each expert takes part in role playing and are given a few shots of CoT demonstrations integrated with expertise domain knowledge or instructions. Then the experts' answers would be assembled through the principle of "minority submission to the majority".

Results: The domain-knowledge embedded prompt engineering method outperforms other techniques in almost all tasks related to small molecules and crystal materials, and over half of the tasks on enzymes. Across different task types and complexities, the domain-knowledge approach consistently outperforms general methods, especially in tasks requiring complex reasoning or intricate knowledge of experimental data. This highlights the model's adeptness at navigating and synthesizing domain-specific information to generate more accurate and relevant responses. Moreover, when examining the performance across various materials, it’s evident that the tailored prompts significantly enhance the LLM's ability to process and analyze data from distinct scientific domains.

Case Studies: Three pivotal case studies are chosen due to their significant implications in scientific research and industry applications. These materials, MacMillan's catalyst, paclitaxel, and lithium cobalt oxide, represent cornerstone discoveries in their respective fields, each posing unique challenges and opportunities for exploration through AI-assisted methodologies. The MacMillan catalyst, a Nobel-recognized innovation in organocatalysis , is studied for its potential to revolutionize synthetic chemistry through enhanced catalytic processes. The task involves dissecting the catalyst's complex structure and predicting its reactivity and selectivity, a testament to the model's capability to navigate intricate chemical landscapes. Paclitaxel, a key agent in cancer therapy , is explored for optimizing its synthesis pathway, highlighting the model's ability to contribute to pharmaceutical advancements by streamlining synthetic routes for complex molecules. Lastly, lithium cobalt oxide, essential in lithium-ion battery technology , is examined for its crystallographic properties and electrochemical behavior.

Conclusion: The integration of domain-specific knowledge into LLMs through prompt engineering offers significant improvements in performance across various scientific tasks. This approach not only makes LLMs more applicable in specialized areas but also highlights their potential as powerful tools for scientific discovery and innovation. The study also outlines future directions, including expanding domain coverage, integrating datasets and tools, and developing multi-modal prompting techniques.

References:

[1] Rane, N. L., Tawde, A., Choudhary, S. P., & Rane, J. (2023). Contribution and performance of ChatGPT and other Large Language Models (LLM) for scientific and research advancements: a double-edged sword. International Research Journal of Modernization in Engineering Technology and Science, 5(10), 875-899.

[2] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., ... Amodei, D. (2020). Language Models are Few-Shot Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901).

[3] Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, brian, Xia, F., Chi, E., Le, Q. v, & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 24824–24837).

[4] Zhang, S. J., Florin, S., Lee, A. N., Niknafs, E., Marginean, A., Wang, A., Tyser, K., Chin, Z., Hicke, Y., Singh, N., Udell, M., Kim, Y., Buonassisi, T., Solar-Lezama, A., & Drori, I. (2023). Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models. ArXiv E-Prints, arXiv:2306.08997.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Emeritus Members $105.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00