Functional module identification and function prediction from protein interaction networks
MetadataShow full item record
Since the completion of sequencing human genome, uncovering the principles of interactions and the functional roles of proteins has been in the spotlight in this post-genomic era. The interactions between proteins provide insights into the underlying mechanisms of biological processes within a cell. The functions of an unknown protein can be postulated on the basis of its interaction evidence with known proteins. The systematic analysis of protein interaction networks has thus become a primary issue in current Bioinformatics research. A wide range of graph theoretic or statistical approaches have attempted to effectively analyze the protein interaction networks. However, they had a limitation in accuracy and efficiency because of the challenges as following. First, the protein-protein interaction data, generated by large-scale high-throughput experiments, are not reliable. Next, the protein interaction networks are typically structured by complex connectivity. Finally, each protein performs multiple functions in varying environmental conditions. In this dissertation, I explore the quantitative characterization of protein interaction networks based on their unique features such as small-world phenomenon, scale-free distribution and hierarchical modularity. In particular, I focus on accurate, efficient mining of protein interaction networks for the purpose of identifying functional modules and predicting protein functions. A functional module is defined as a maximal set of proteins that participate in the same function. As a pre-process, the network weighting is applied by the integration of functional knowledge from the Gene Ontology database. The semantic similarity and semantic interactivity measures estimate the interaction reliability, which is assigned to the corresponding edge as a weight. These weighted interaction networks can facilitate the accurate analysis for functional knowledge discovery. I introduce four different approaches for functional module identification and function prediction. First, in the information flow-based approach, I design a novel information flow model that quantifies the propagation of functional information of a protein over the entire complex network. To efficiently implement this model, I propose a dynamic flow simulation algorithm based on random walks. The flow pattern of a protein, generated by this algorithm, indicates its functional impact on the other proteins. Second, the graph restructuring approach retrieves a protein interaction network into a hub-oriented hierarchical structure based on the new definitions of path strength and centrality. This algorithm thus reveals the hierarchically organized functional modules and hubs. Next, the association pattern-based approach searches the functional association patterns that frequently occur in a protein interaction network. I apply the frequent sub-graph mining algorithm to the labeled graph that is generated by assigning the set of functions of a protein into the node label. Finally, graph reduction is the technique of simplifying the complex connecting pattern of a protein interaction network. Using the reduced graph, the modularization is performed by the iterative procedure of the minimum weighted cut and node accumulation. The generation of protein-protein interaction data is rapidly proceeding, heightening the demand for advances in computational methods to analyze these complex data sets. The approaches presented in this dissertation employ novel, advanced data-mining techniques to discover valuable functional knowledge hidden in the complex protein interaction networks. This knowledge can be the underlying bases of practical applications in Biomedical Science, e.g., disease diagnosis and drug development. Currently, explosive amounts of heterogeneous biological data are being produced. Developing effective integration methods for incorporating such data is a promising direction for future research.
Showing items related by title, author, creator and subject.
Program Manager (2014-04-02)
Functional architecture of the mammalian cell nucleus: An analysis of proximity relationships between genomic functions Malyavantham, Kishore (2007)The results of simultaneous replication (RS) and transcription site (TS) labeling in HeLa cells not only revealed that these fundamental genomic processes exist in two spatially separate domains or zones, but also raised ...
Structure-function analysis of the Saccharomyces cerevisiae RNA polymerase II active center: A functional role for the switch 2 region in transcription start site utilization and abortive initiation Majovski, Robert C. (2007)RNA polymerase II (RNAPII) is responsible for the synthesis of mRNA from eukaryotic protein-encoding genes. In this study, site-directed mutagenesis was employed to probe the function of residues within the Saccharomyces ...