Multi-omic integrative network analysis
MetadataShow full item record
With advanced biotechnology, we have accumulated vast amounts of genomic, epigenomic, transcriptomic, and proteomic data -- collectively called multi-omic data. Integrating and analyzing these multi-omic data poses great informatics challenge due to patient heterogeneity and feature heterogeneity inherent in the multi-omic data. To address these challenges, network analysis has emerged as a powerful tool to analyze complex data and decipher molecular mechanisms underlying complex genetic diseases. However, one challenge of network analysis in genomics is how to construct a reliable network from noisy, incomplete and heterogeneous data.In this work, we developed multiple frameworks for network analysis of omic data. The first two are to address RNA-seq data analysis from small-scale studies when multi-omic profiling may not be available. RNA-seq data is quite common in genetic studies. Analyzing RNA-seq data is challenging due to the small sample sizes and many confounding factors. We have formulated a framework to construct robust, context-specific co-expression from RNA-seq data and developed a Gene Ontology-based module discovery method to identify candidate disease modules. Besides undirected co-expression network, RNA-seq data can also be used to construct directed context-specific gene regulatory network. However, RNA-seq data alone is not sufficient to infer causal links between gene regulator-target pairs. To address this, we assembled a global regulatory network from multiple public gene regulator-target data repositories, the largest human gene regulatory network to the best of our knowledge. We developed a four-step refinement framework to construct a context-specific regulatory network. Furthermore, we devised an innovative technique called collaborative clustering to identify core regulatory modules and network rewiring underlying different conditions.To integrate multi-omic data, we first need to address the heterogeneity in both patients and feature spaces. Our first goal is to identify patient subtypes, which is useful for personalized or precision medicine. We constructed patient similarity networks from individual omic data, and then use network smoothing, fusion and diffusion techniques to incorporate information from networks from different views. Our presented network smoothing and fusion framework can not only reduce noise in individual omic data, but also facilitate seamless multi-omic integration. We further developed a technique called Affinity Network Fusion (ANF) to integrate multi-omic data for cancer patient clustering. ANF has won the best paper award in 2017 IEEE Conference on Bioinformatics and Biomedicine. To address the “big p, small n” problem in the biomedical domain, we developed a semi-supervised deep learning model called the AffinityNet model, which can utilize unlabeled data for few-shot learning.To develop predictable and generalizable deep learning models for the biomedical domain, we presented two frameworks that can incorporate domain knowledge: Multi-view Factorization AutoEncoder and Factor Graph Neural Network. Multi-view Factorization AutoEncoder incorporates biological interaction networks as graph constraints as the regularizers. And the Factor Graph Neural Network directly encoder biological knowledge such as Gene Ontology into the model architecture. Both frameworks combine data-driven and knowledge-driven approaches for biomedical data mining.All these developed frameworks are developed to solve challenging biomedical data analysis problems in the real world. They can be used by bench biomedical researchers to pinpoint disease mechanisms, refine disease subtypes, and advance precision medicine.