(185e) A Graph-Based Software Framework for Data Science | AIChE

(185e) A Graph-Based Software Framework for Data Science

Authors 

Cole, D. - Presenter, University of Wisconsin-Madison
Zavala, V., University of Wisconsin-Madison
Graphs are mathematical representations that that can be used to model a wide range of systems arising in science and engineering. Simply stated, graphs are models that capture connectivity in systems by using sets of nodes and edges (links between nodes) and data embedded in such nodes and edges. For example, graphs have been used to represent chemical processes (collections of interconnected unit operations) [1] and molecules (collections of interconnected atoms) [2,3]. By creating graph representations for systems of interest, researchers have also enabled a variety of descriptors that are useful for analysis and graph quantification such as spanning trees, number of nodes or edges, number of connected components, and topological descriptors (e.g., Euler Characteristic) [4,5,6,7,8]. These and other tools have been applied in analyzing networks in different fields, including chemical processes [6], biological systems [9], brain networks [10], or geomorphology [11].

Many datasets encountered in science and engineering can also be represented as graphs, and doing so can unravel unique insights. For example, a data matrix can be represented as a graph where each entry is a weighted node with weight corresponding to the entry value and with edges that capture adjacency between matrix entries (form a perfect mesh). Furthermore, any symmetric matrix (e.g., a covariance matrix or a Euclidean distance matrix) can be represented an edge weighted graph where the rows and columns of the matrix correspond to nodes and each matrix entry corresponds to a weighted edge between the row node and the column node corresponding to that entry. Graph data representations can be manipulated through operations such as filtration, aggregation, or partitioning and analyzed using topological descriptors [6]. This approach differs from standard matrix analysis techniques (e.g., eigenvalue analysis). Graph analysis tools can also be applied to high-dimensional representations; for example, molecules contain large amounts of data within each atom node (multiple attributes) [2,3].

In this work, we present a Julia-based software framework for the modeling analysis of graph-based data representations. The framework provides user-friendly capabilities to automatically manipulate diverse raw data types (e.g., matrices and time-series) to obtain different graph representations. The graph representations can then be analyzed using filtration and aggregation functions and summarized using a wide range of graph descriptors. Moreover, our framework facilitates the partitioning and visualization of graph objects to gain deeper analysis. We also present examples of applications for these tools and show how these tools simplify some data analysis.

References

[1] Yue Shao, and Victor M Zavala. Modularity measures: concepts, computation, and applications to manufacturing systems. AIChE Journal, 66(6):e16965. 2020.

[2] Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. MoelculeNet: a benchmark for molecular machine learning. Chemical Science, 9(2):513-530. 2018.

[3] Shiyi Qin, Tianyi Jin, Reid C Van Lehn, and Victor M Zavala. Predicting critical micelle concentrations for surfactants using graph convolutional neural networks. Journal of Physical Chemistry B 125(37): 10610-10620. 2021.

[4] Frank Harary. Graph Theory. Addison-Wesley Pub. Co. 1969

[5] Mark Newman. Networks: An Introduction. Oxford University Press, 2010.

[6] Alexander Smith and Victor M Zavala. The Euler characteristic: a general topological descriptor for complex data. Computers & Chemical Engineering, 154:107463. 2021.

[7] Alexander D Smith Paweł Dłotko, and Victor M Zavala. Topological data anlaysis: concepts, computation, and applications in chemical engineering. Computers & Chemical Engineering, 146:107202. 2021.

[8] Elizabeth Munch. A user’s guide to topological data analysis. Journal of Learning Analytics, 4(2): 47-61. 2017.

[9] Georgios A Pavlopoulos, Maria Secrier, Charalampos N Moschopoulos, Theodoros G Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider, and Pantelis G Bagos. Using graph theory to analyze biological networks. BioData Mining, 4:10. 2011.

[10] Karen Caeyenberghs, Helena Verhelst, Adam Clemente, and Peter H Wilson. Mapping the functional connectome in traumatic brain injury: what can graph metrics tell us. NeuroImage 160:113-123. 2017.

[11] Tobias Heckmann, Wolfgang Schwanghart, and Jonathan D Phillips. Graph theory—recent developments of its application in geomorphology. Geomorphology, 243:130-146. 2015.