(4ha) Data-Centric Modeling, Design, and Synthesis of Complex Materials | AIChE

(4ha) Data-Centric Modeling, Design, and Synthesis of Complex Materials

Authors 

Jiang, S. - Presenter, University of Wisconsin-Madison
Research Interests

The interplay of topological, compositional, and chemical complexities within materials such as natural and synthetic polymers defines their properties and leads to diverse applications, from thickening agents in foods to electrolyte solvents in batteries. The heterogeneous materials data landscape, marked by a scarcity of data from costly experiments and computations and an abundance of complex data from high-throughput methods, presents significant challenges in understanding structure-property relationships and identifying and synthesizing materials with desired functions.

My future research will adopt a data-centric approach that integrates theory, simulation, and machine learning to characterize, understand, and guide the design and synthesis of complex materials for health and sustainability applications.

In the following, I outline three research themes central to my research interests.

1. Enhance viability and interpretability in data representation. Traditional descriptor-based representations are effective in specific domains, but their limitations, such as difficulty in capturing complex interactions, necessitate a more versatile and unified representation framework for diverse material systems. During my graduate research with Victor M. Zavala, I focused on topological data analysis (TDA) to characterize optical responses in liquid crystals, yielding physically interpretable data representations for sensor design [1]. Building on this work, I will develop a unified topological framework to extract spatiotemporal features across various length and time scales, without relying on human-selected descriptors. This framework will be particularly useful for materials like zeolites, metal-organic frameworks, and polymers, where the interplay of molecular morphology and topology governs key properties such as catalytic activity, adsorption, and mechanical strength. Additionally, building upon my postdoctoral research with Michael A. Webb, where I applied interpretable deep learning techniques to polymer system design [2], I will integrate domain knowledge and physics-informed constraints into the unified topological representation to enhance its viability for analyzing material systems, especially those with limited data.

2. Enhance efficient data generation. As tandem experimental and computational investigations proliferate, there is a growing urgency for data generation to become more efficient. Data-driven models, essential for understanding structure-property relationships, often require large and diverse datasets for effective training. However, the conventional approach of manual experimental or simulation design, while widely used, is often limited by its cost and time requirements. In contrast, Bayesian optimization offers a potential alternative, but its sequential and synchronous nature hinders its effectiveness in high-throughput scenarios. My expertise in molecular simulation and large-scale distributed computing uniquely positions me to develop a versatile, GPU-accelerated, asynchronous data generation framework [2-4]. This framework will integrate with existing simulation software such as LAMMPS and HOOMD-blue. Furthermore, by incorporating diversity metrics into the generation algorithm, I will address the prevalent issue of inadequate data diversity, which introduces bias and limits the predictive power of data-driven models. This enhancement will improve the quality, generalizability, and applicability of the resulting datasets.

3. Enhance translation to realistic systems. The ubiquity of materials in health and sustainability spans a wide range, from drug delivery, tissue engineering, and wearable electronics to fuel cells, solid-state batteries, and water treatment membranes. However, most data-driven models in materials science rely on idealized datasets, neglecting the inherent defects and variability present in real materials. This issue is particularly problematic for materials such as polymer blends, where uncertainty is an intrinsic property. To address the lack of realism in datasets, I will leverage my expertise in uncertainty quantification and defect detection to create datasets that reflect the imperfections and variability of real materials [5]. Additionally, I will design algorithms to quantify uncertainties and defects within these datasets, providing insights into how imperfections affect material properties. Building on my expertise in generative deep learning and my experience collaborating with 15 research teams [2,6-8], I will continue to partner with experimental groups to facilitate the closed-loop generation, characterization, and synthesis of materials derived from data-driven models. This collaborative approach will accelerate the development of innovative materials with tailored properties for specific applications, ultimately contributing to advancements in both the health and sustainability sectors.

Teaching Interests

In a world where complexity is the norm and challenges are interconnected, my goal as an educator is to cultivate systems thinking in students, equipping them with transferable skills to address open-ended problems. Rather than focusing on rote memorization, I will prioritize helping students understand how concepts interrelate and adapt to emerging scenarios. I will create a student-centered, active learning environment where students collaboratively design and address real-world-inspired projects that integrate multiple course concepts. Through continuous feedback mechanisms, I will refine my teaching, empowering students to identify, analyze, and solve open-ended problems with societal impact.

My teaching journey began unconventionally, as I started creating online crash courses in physical chemistry during my sophomore year. This initiative resulted in twenty students seeking offline tutoring, igniting my passion for teaching. During my Ph.D. studies, I served as a teaching assistant, where I delivered lectures, created assessments, and conducted interactive learning sessions for both a graduate-level statistics course (30 students) and an undergraduate-level process modeling course (50 students). In my office hours and lessons, I endeavored to create a safe and inclusive environment where all students felt comfortable and capable of learning. Recognizing the critical importance of inclusivity within STEM disciplines, I actively engaged in DEI-supported mentoring programs, guiding ten undergraduate and high school scholars from diverse backgrounds, including international students and those from underrepresented groups in the U.S. By tailoring my approach to individual needs, I fostered a supportive mentoring environment that built student confidence. Beyond research schedules, I provided opportunities for students to present at national meetings and assisted them with graduate school and job applications, further supporting their academic and professional development.

I am qualified and interested in teaching courses related to statistics, data science, machine learning, and physical chemistry at both undergraduate and graduate levels. My long-term educational objective is to integrate data-driven machine learning methodologies into the undergraduate chemical engineering curriculum. Despite its prominence in graduate research, machine learning remains underrepresented at the undergraduate level, posing a challenge for students who aspire to excel in this dynamic field. Furthermore, the chemical industry increasingly seeks professionals proficient in machine learning. While advanced topics like large language models warrant specialized graduate-level courses, I firmly believe that the foundational theories and practical applications of machine learning can be effectively taught to undergraduates. This conviction is based on three key factors: 1) the intuitive nature of machine learning principles, 2) the feasibility of teaching its core mathematical theories using concepts from undergraduate calculus and linear algebra, and 3) the accessibility of user-friendly open-source machine learning tools such as PyTorch and Scikit-learn. By incorporating machine learning into the undergraduate curriculum, students will acquire a versatile mathematical framework for addressing data-driven challenges, significantly enhancing their value in a wide array of professional fields.

Selected Publications (8 of 18)

[1] S. Jiang, N. Bao, A. D. Smith, S. Byndoor, R. C. Van Lehn, M. Mavrikakis, N. L. Abbott, and V. M. Zavala. Scalable extraction of information from spatiotemporal patterns of chemoresponsive liquid crystals using topological descriptors. The Journal of Physical Chemistry C, 127(32):16081–16098, 2023.

[2] S. Jiang, A. B. Dieng, and M. A. Webb. Property-guided generation of complex polymer topologies using variational autoencoders. npj Computational Materials, 2024.

[3] A. K. Chew, S. Jiang, W. Zhang, V. M. Zavala, and R. C. Van Lehn. Fast predictions of liquid-phase acid- catalyzed reaction rates using molecular dynamics simulations and convolutional neural networks. Chemical science, 11(46):12464–12476, 2020.

[4] S. Jiang and P. Balaprakash. Graph neural network architecture search for molecular property prediction. In 2020 IEEE International Conference on Big Bata, 1346–1353. IEEE, 2020.

[5] S. Jiang, S. Qin, R. C. Van Lehn, P. Balaprakash, and V. M. Zavala. Uncertainty quantification for molecular property predictions with graph neural architecture search. Digital Discovery, 2024.

[6] S. Qin, S. Jiang, J. Li, P. Balaprakash, R. C. Van Lehn, and V. M. Zavala. Capturing molecular interactions in graph neural networks: a case study in multi-component phase equilibrium. Digital Discovery, 2(1):138–151, 2023.

[7] S. Jiang, Z. Xu, M. Kamran, S. Zinchik, S. Paheding, A. G. McDonald, E. Bar-Ziv, and V. M. Zavala. Using ATR-FTIR spectra and convolutional neural networks for characterizing mixed plastic waste. Computers & Chemical Engineering, 155:107547, 2021.

[8] S. Jiang, J. Noh, C. Park, A. D. Smith, N. L. Abbott, and V. M. Zavala. Using machine learning and liquid crystal droplets to identify and quantify endotoxins from different bacterial species. Analyst, 146(4):1224–1233, 2021.