Representation learning on multiple sources
In the era of big data, high-variety of information sources capture characteristics of a system from different perspectives. Such multiple sources demand cost-effective, innovative forms of learning strategies for enhanced insight and decision making. For learning algorithms on multiple sources, much of the effort goes into preprocessing pipelines and transformations that extract data representations across different sources to support effective machine learning. This thesis focuses on learning representations of multiple sources in four different settings including multiple independent sources, auxiliary sources, dependent sources and heterogeneous sources. In the thesis, we conduct four studies. The first study is about integrating multiple independent sources. We show that by combining such sources, e.g. ranking lists from different experiments and signals from different channels, we can achieve significantly better prediction. The second study is on utilizing auxiliary sources. In details, with transferred knowledge from publicly available and well annotated source data sets, we can overcome the challenge of lack of supervised information on mobile users and achieve better performance on the task of anomaly detection. In the third study, we developed models on dependent sources in which case some sources depend on other sources. We applied the developed models for link prediction, as well as for learning, analyzing and predicting object roles on dynamic networks. In the last study, we research on representation learning on heterogeneous sources. We show that through integrating node attributes and network structures, we can extract latent representations which support efficient and effective link prediction and node classification.