Community analysis on multiple sources: Progression, evolutionary and mutual knowledge learning
With the booming of information, it becomes easier and easier for us to collect related data from multiple sources which may be represented with multiple types, from different sources, in different formats, or with different representations. With the integrative analysis of multiple information sources, interesting patterns and knowledge, which are originally hidden behind single source can now be discovered. Note that, despite the benefit of learning from multiple sources, it is also a significant challenge due to the possible noise, conflict and highly imbalance in and between the sources. Community analysis, as a traditional topic, focuses on partitioning the entities into communities such that the entities belong to the same community are similar to one another and different from the entities in other communities. Now, in the light of multiple sources, we have extended this traditional research to several new fields. In this thesis, we focus on the problem of demonstrating new perspectives for the community analysis, which can supplement and provide new and significant insights into communities and entities in a wide variety of applications. There are three major parts of my new community analysis from multiple information sources: evolutionary, progression and mutual learning of communities. For the evolutionary learning, we focus on the task of detecting and tracking the evolutionary patterns of functional module (i.e., gene community), which provides insights into the underlying behavior of the molecular system and is also valuable for monitoring chronic and genetic disease development and outcome. To fulfill this goal, we design a novel framework to categorize and track the evolutionary events of functional modules from biological networks over consecutive timestamps. For the progression learning, the goal of the thesis research is to explore the temporal progression of community. Such knowledge cannot be discovered in the single source and can only be revealed by joint analysis of the time dependent data. We propose a new problem of analyzing the progression of community strengths, which is a temporal measure which represents the probability that a particular community has a firm structure at the current timestamp. Discovering the progression of community strengths can offer significant insights in a variety of applications. It can help us discover some interesting community information which cannot be directly obtained from traditional community analysis. For the mutual learning, we focus on discovering the mutual community knowledge shared among multiple sources, and then use this knowledge to make further analysis including mutual shared communities and significant edges detection. First of all, since more and more evidence suggest that human diseases are not isolated from each other, its significant and interesting to detect the common functional gene modules driving the core mechanisms among multiple related diseases. To address these challenges, we propose a novel deep architecture to discover the mutual functional gene modules across multiple types of diseases, which is hidden via individual disease learning. Second, we propose to detect significant edges from a target network by using mutual community knowledge from multiple auxiliary networks. Instead of directly using noisy and confusing edge information, we use mutual community knowledge learned across both target network and auxiliary networks to detect significant edges. The mined mutual community knowledge captures the key profile of network relationships and thus can be used to judge whether an existing edge indicates true or false relationship. In this thesis, several new perspectives of community learning in many applications on multiple data sources have been provided. The algorithms developed in this thesis have been proved useful in many areas, including social network analysis and disease learning. Moreover, the proposed methods have the potential of being applied to many other areas. As the amount of data continue to explode, there are great opportunities and challenges to infer meaningful knowledge of communities from multiple sources of massive data collection.