(374t) Literature-Based Discovery of Biological Concepts Associated with Diabetic Kidney Disease Using Semnet 2.0 | AIChE

(374t) Literature-Based Discovery of Biological Concepts Associated with Diabetic Kidney Disease Using Semnet 2.0

Authors 

Deng, J. H., Georgia Institute of Technology and Emory University
Mitchell, C. S., Georgia Institute of Technology and Emory University
Introduction: Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease worldwide. The pathophysiology of DKD is multifactorial and complex and often characterized by glucose impairment, uncontrolled inflammatory response, glomerular damage, and scarring. The disease-related pathways are often identified using domain knowledge and knowledge acquired from manual literature curation. However, a manual curation of all possible relationships might be difficult in a model of a complex pathophysiology like DKD. In this study, we used a natural language processing literature-based discovery (LBD) method, SemNet 2.0 developed by our team, to provide a holistic understanding of signaling drivers and pathways that are responsible for the early stages of DKD progression.

Method: Cross-domain text mining of PubMed articles with SemNet 2.0 [1] was performed to identify and rank multiscale and multi-factorial pathophysiological concepts related to DKD [2]. SemNet 2.0 was used to perform a cross-domain analysis across five pathological domains: (1) diabetes (DB), (2) kidney disease (KD), (3) immune response (IR), (4) glomerular endothelial cell (GEC), and (5) DKD, which were chosen based on prior knowledge. A list of target concepts based on prior knowledge and manual literature search was created. Two target concepts from each domain were selected at random as input. A pairwise domain analysis between each domain pair (total of 10 pairs) identified source concepts associated with the given target that were mutually shared in each domain (Figure 1). The relevance of the identified genes and proteins in each pairwise domain was assessed and ranked using normalized mean HeteSim scores, which is a metric developed to quantify relevance in heterogeneous networks [3]. High-ranking genes and proteins were found at the intersection of DKD, IR, and GEC domains, and were analyzed and mapped to their biological functions [2]. Further, Cytoscape, a graphical visualization software, was used to visualize an interaction network between high-ranked genes and proteins and other regulatory species.

Results: The analysis yielded two valuable outcomes: (1) the relatedness between source genes or proteins that are mutually shared by DKD-IR-GEC domains; (2) the creation of a protein/gene interaction network in place of an inherently more limited manual literature review. The top 10% of ranked concepts were mapped to the following biological functions: angiogenesis, apoptotic process, cell adhesion, chemotaxis, growth factor signaling, vascular permeability, nitric oxide response, oxidative stress, cytokine response, macrophage signaling, NFkB factor activity, TLR pathway, glucose metabolism, inflammatory response, ERK/MAPK signaling response, JAK/STAT pathway, T cell-mediated response, WNT/beta-catenin pathway, renin-angiotensin system, and NADPH oxidase activity. EPHB4, SERPINB1, ITGB1, TYK2, CREG1, NFKBIA, SPI1, and SNAP23 were among the highest-ranked genes or proteins associated with immune response, GEC, and DKD. Further, Cytoscape-based visualization allowed straightforward interpretation of regulatory relationships associated with DKD generated from SemNet 2.0.

Conclusion: SemNet 2.0 efficiently identified and prioritized signaling molecules and pathways associated with DKD and their inter-relatedness with other pathophysiological domains. These findings corroborate the relevance of studying the synergistic interaction between the immune system and glomerular endothelial cells to understand the early stages of DKD progression better. The results support the use of LBD to aid in the prioritization of multiscale pathological mechanisms and drug targets, the development of protein-protein interactions and biochemical models, the testing of hypotheses through experiments, and the advancement of biomedical decision-making.

Reference: [1] Kirkpatrick, A et al., Big Data Cogn. Comput. (2022). [2] Patidar K. et al., bioRxiv preprint (2024). [3] Shi, C et al. IEEE Trans. Knowl. Data Eng. (2014).

Acknowledgments: This work was supported by NIH grant R35GM133763 to A.N.F.V., NSF CAREER grant 2133411 to A.N.F.V., NSF CAREER grant 1944247 to C.S.M., NIH grant U19-AG056169 sub-award to C.S.M., and by the Chan Zuckerberg Initiative under grant 253558 to C.S.M.

Figure 1 caption: Workflow of cross-domain analyses in SemNet 2.0 performed to identify intersecting source nodes (concepts) across five domains: diabetes (DB), kidney disease (KD), immune response (IR), diabetic kidney disease (DKD), and glomerular endothelial cell (GEC).