(633b) High-Performance Bioinformatics Workflow Incorporating Metabolomics Data Analysis, Processing, and Integration within the Exposome Concept.
AIChE Annual Meeting
2022
2022 Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Systems Biology for Engineering Metabolism
Thursday, November 17, 2022 - 12:48pm to 1:06pm
The sample preparation and data acquisition protocols for the metabolomics analysis of serum samples have been previously described. Briefly, serum samples were thawed at 4οC and immediately vortexed. A protein removal step of 200μL was followed using 1:3 methanol. The samples were centrifuged at 15000rpm for 15 min, and the supernatant was transferred to a new Eppendorf to dryness. We recontructed the dry extracts with LC-MS grade water containing internal standards, and we vortexed them vigorous. A final centrifugation at 15000rpm for 10 min and were transferred to autosampler vials using inserts for analysis. The analysis was performed with an Agilent 1290 Infinity HPLC LC System coupled to an Agilent 6540 HRMS-QTOF/ LCMS system in two ionisation mode; positive and negative. Spectra were acquired in full MS scan mode between 50 and 1000 m/z at a scan rate of 1.5 spectra/s in centroid mode at a resolution of 40,000 FWHM. The source conditions are as follows: gas temperature 300 oC, drying gas 7 l/min, nebuliser 50psig, fragmentor 250 V, skimmer 65 V, and capillary voltage 3500V or -3500V in positive or negative mode, respectively. A Fortis Speed Core pH+ C18 (2.1x 100 mm, 2.6 μm) preceded by a filter column was used. The column was thermostat at 40 oC, and the flow rate was 250 μl/min. The mobile phases were water (solvent A) and methanol (solvent B), amended with 0.01% formic acid. The optimised gradient elution program was the same for both ionisation modes: 0% B for 2 min, 100 % B for 17 min, 100 % B for 5 min, and restore the initial conditions with 0 % B for 4 min. The injection volume was 5 μL.
Spectral processing was based on the R packages XCMS and CAMERA. Briefly, immediatly after raw data import in .mzML format, we plotted the base peak chromatograms of the samples, boxplots representing the distribution of total ion currents per file, and heatmaps, as a first evaluation of experiment performance, thus to detect any failing run. QC samples were used to evaluate the performance of the selected parameters for the peaks detection, alignment, and grouping. Chromatographic peak detection was performed based on the centWave algorithm. We optimised the mass error (ppm) and mzdiff parameters using the IPO R-based package. The signal to noise (snthresh) and the noise level parameters were estimated based on an in-house developed approach. Obiwarp method was used for alignment. We performed gaps-filling to minimise the false negatives. The PerformPeakAnnotation function was used for isotope and adduct annotation. The list of the detected metabolites was filtered to minimise the false positives and obtain consistent variables. We excluded the metabolites having 100% presence in solvents and the variables with more than 80% missing values in QCs. The resulted matrix was further reduced by excluding the metabolites with RSD >30% in QCs. The instrument and overall process variability were then determined by calculating the median RSD for authentic internal standards and all endogenous metabolites. Normalisation by median, mean centering scaling, and log transformation was performed to transform the data matrix into a more Gaussian-type distribution, thus reducing systematic error in experimental conditions. A major concern in exposome studies that include metabolomics analysis is the systematic signal deviations between batches and within batches caused mostly by the build-up of dirt in the interface of an LC-MS system or imperfect column regeneration in the gradient program. We performed intensity drift correction to improve the signal/noise ratio and reduce various types of overfitting using a combination of the algorithms included in the batchCorr, and the DBnorm R packages. Finally, we run an in-house developed algorithm to annotate features by comparing the MS/MS from the analysed samples and the MS/MS from the online databases (HMDB, KEGG, and LipidMaps) and in-house build database. We stricked the query to metabolites that can be detected in blood. We searched based on the following adducts: "M+2H", "M+H+NH4", "M+ACN+2H", "M+2ACN+2H", "M+H", "M+NH4", "M+Na", "M+ACN+H" in positive ionization mode, and the following in negative ionization mode: "M+ACN+Na", "M+2ACN+H", "2M+H", "2M+Na", "2M+ACN+H" "M-2H", "M-3H", "M-H2O-H", "M+Na-2H", "M-H", "M+Cl", "M+FA-H", "M+K-2H", "2M-H".For database matching, mass error in ppm was set as in the peaks detection step. Individual enrichment pathway analysis was using the Fisherâs method. The databases used were the following: KEGG, WikiPathways, Reactome, HumanCyc, EHMN, PharmGKB, SMPDB, BioCart, INOH, and PID. The metabolites names, KEGG IDs, and PubChem IDs were used as identifiers. For the size of the population, we consider the metabolites that have been detected in human samples even if they have not been quantified according to the HMDB database. We did not consider the predicted metabolites to avoid the false positives results. The Exposome-Wide Association Study (EWAS) approach was adopted to comprehensively and systematically explore and associate multiple exposure factors and modifiers, discovering and replicating robust correlations with metabolites levels and dysregulated pathways. The âX-Wide Association Analyses (XWAS)â R package was implemented to explore and associate the detected metabollite, and pathways with multiple exposure factors. The latest is called Exposome-Wide Association Study (EWAS) analysis, and it is based on the Genome-Wide Association Study (GWAS). The results were visualised using volcano plots and correlation globes.
As exposure factors, we considered 14 phthalate metabolites (MEP, MBzP, MiBP, MnBP, MCHP, MnPeP, MEHP, 5OH-MEHP, 5oxo-MEHP, 5cx-MEHP, MnOP, OH-MiNP, cx-MiNP, and OH-MiDP), and 2 Hexamoll® DINCH® metabolites (OH-MINCH and cx-MINCH) determined by online high-performance liquid chromatography coupled to tandem mass spectrometry (HPLCâMS/MS) using internal isotope-labelled standards.
Most perturbed metabolic pathways were related to oxidative stress and stress-activated signalling pathways, including the urea cycle, indicating an imbalance between the cellular reactive oxygen species (ROS), which may be an effect of exposure to phthalates, and the inability of the cell to detoxify them.
The list of the detected metabolites included the glutamate, which is utilised as a transamination source for aspartate and either pyruvate or 3-hydroxybutyrate as a source of acetyl-CoA. The acetyl-CoA is one of the fundamental units for metabolic energy production (ATP) through the tricarboxylic acid (TCA) cycle, followed by electron-transport chain mediated oxidative phosphorylation. In parallel, it is a fundamental unit for energy storage via gluconeogenesis and lipogenesis.
In addition, the integrated analysis revealed perturbation in the metabolism of biogenic amino acids (Biogenic Amine Synthesis/ Dopamine metabolism). Exposure to phthalates can disrupt the synthesis, transport, and release of biogenic amines, including dopamine, serotonin, norepinephrine, and glutamate, which modulate behaviour, cognition, learning, and memory. Thus, perturbations of the main pathways of dopamine, serotonin, norepinephrine, and glutamate metabolism could identify potential neurodevelopment biomarkers.
Overall, we present the usefulness of our advanced workflow in the R programming language by analysing a real dataset, including metabolomics and exposure data.