Regularization for Variable Selection with Flexible Missingness Mechanism: Methodology, Algorithm and Application
MetadataShow full item record
Regularized likelihood has been proven to be effective in variable selection. This method has been well developed theoretically and computationally in the past two decades. However, two major problems still exist in practice. One is caused by non-ignorable missing data. The sensitivity of missing mechanism assumption sets obstacles to select variables of interest. The other one is that the normality of response variable is rare in practice. Clinical data are usually distributed asymmetrically and sometimes finitely, preventing direct application of penalized likelihood. Research in this dissertation is driven by the need for statistical tools to apply regularized likelihood in practice through solving the two problems stated above.This dissertation is organized as follow. An introduction of our research topic is given in Chapter 1. We focus on a flexible and generally applicable missing data mechanism, which contains both ignorable and nonignorable missing data mechanism assumptions. In Chapter 2, we show that how to achieve variable selection purpose through incorpating this missing data mechanism and a pseudo likelihood funciton together when missing data, especially nonignorable missing data, exists. The computational algorithm used to optimize the objective function is also developed. Theoretical properties for variable selection consistency is discussed in Chapter 3 and all technical proofs are included in Chapter 8. Tuning paramter selection is another important topic for variable selection as it controls the balance between complexity of the model and prediction accuracy. In Chapter 4, we not only expolore and extend currently existed tuning parameter selection method, but also propose a new stability-based tuning parameter selection method to select the optimal tuning parameter. In Chapter 5 and 6, we explore our method through comprehensive simulation and real data studies. We conclude this dissertation with a discussion in Chapter 7.