Heterogeneous medical data analytics for healthcare: Modeling and risk prediction
With the expansion of the healthcare industry and the overwhelming amount of electronic health records (EHRs) shared by healthcare institutions and practitioners, we take advantage of EHR data to develop an effective disease risk management model that not only models the progression of the disease, but also predicts the risk of the disease for early disease control or prevention. EHR is a longitudinal electronic record of patient health with heterogeneous medical data that can be classified to image data and numerical data. We focus on addressing the issue of osteoporosis that is a common disease associated with aging and may be clinically silent but can cause significant mortality and morbidity after onset. Some of the fundamental questions have been attracting researchers' interest like how to utilize these risk factors to learn an abstract representation for evaluating the development of bone diseases? How to select the most influential risk factors that cause the disease progression? Towards that end, my doctoral thesis consists of three parts: 1) We develop and evaluate a novel three-dimensional (3D) computational bone framework capable of providing spatio-temporal 3D bone microstructure model, derived quantitative measures of 3D bone model and analysis of bone mineral density and bone strength for the model. 2) We propose a generative framework for prediction and informative risk factor selection of bone diseases. We extract the integrated features by modeling the latent relationships among risk factors for predicting the risk of the disease. We also select the salient features by discriminating between people suffering from the disease and without the disease. Those selected features are valuable resources for the disease prevention and inspiring the medical research for healthcare. 3) We infer the precise phenotypic patterns from the new feature representation from EHR data for grouping risk factors for diseases. High-throughput phenotyping should generate a bunch of phenotypes with minimal human intervention such that they could be maintained over time. The unbiased, EHR-driven phenotype discovery could be achieved using a massive EHR dataset and a computationally intense analysis capable of identifying all of the phenotypes in the dataset.