Frequentist and Bayesian Extensions to Data-Driven Likelihood Methods with Biostatistical Applications
MetadataShow full item record
This dissertation covers a broad range of statistical methodology developments, along with real-world applications of these methodologies in biostatistical settings. The methods developed here make novel contributions to different areas of statistical research including bayesian statistics, frequentist statistics, nonparametric statistics, and sequential statistics. Four research projects that represent a large part of my research at Biostatistics department at the State University of New York at Buffalo are included in this dissertation. The four topics are: (1) Data-driven confidence interval (CI) estimation with an adjustment for skewed data, (2) An extension to empirical likelihood for evaluating probability weighted moments, (3) Empirical likelihood ratio tests with power one, (4) A sequential density-based empirical likelihood ratio test for treatment effects. Likelihood principle is unarguably the most important concept in statistical inference. When the underlying data distributions are completely specified, the well-known Neyman-Pearson Lemma shows that the likelihood ratio based tests are uniformaly most powerful. Relevent parametric statistical inference procedures have been well developed in statistical literature. However, in practice there are many difficulties that prevent researchers from applying parameteric statistical methods. We illustrate this point via the following three reasons: 1) Oftentimes the underlying data distributions are unknown, e.g. in the sequential setting, it is difficult to specify a parameteric distribution function before the collection of data; 2) If one specifies a parametric distribution function, it may be difficult to test the validity of the parametric assumption, e.g. in the sequential setting the sample size is a random variable, hence one cannot directly apply goodnees of fit tests to such datasets; 3) Incorrectly specified parametric distribution functions may lead to biased estimation and inferential procedures. Thus, in order to overcome such problems, the statistical methodologies developed in this dissertation revolve around the concept of the nonparametric likelihood. Our approach avoids having to specify a fully parametric distribution while minimizing the assumptions placed on the underlying data. It is hoped that an appreciation on the usefulness of the nonparametric statistical methodologies in practice and a perspective for their future development may be gained by readers. Chapter 1 lays out a variety of fundamental statistical principles and tools as treated in this dissertation with a focus on Bayesian statistics, likelihood ratio based statistical testing hypothesis, empirical likelihood, density-based empirical likelihood, and sequential statistical analyses. Chapter 2 deals with nonparametric CI estimation in the Bayesian setting. Bayesian CI estimation is a statistical procedure that has been well addressed in both the theoretical and applied literature. Parametric assumptions regarding baseline data distributions are critical for the implementation of this method. We provide a nonparametric technique for incorporating prior information into the equal-tailed and highest posterior density CI estimators in a Bayesian manner. We propose to use a data-driven likelihood function, replacing the parametric likelihood function with its nonparametric counterpart in order to create a distribution-free posterior. Higher order asymptotic propositions are derived to show the efficiency and consistency of the proposed method. We demonstrate that the proposed approach provides accurate confidence regions given a skewness correction. An extensive Monte Carlo (MC) study confirms the proposed method significantly outperforms the classical frequentist based CI estimation. A real data example related to a study of myocardial infarction illustrates the excellent applicability of the proposed technique. Chapter 3 treats the nonparametric statistical inferences of probability weighted moments (PWMs). PWMs generalize the concept of conventional moments of a probability function. These methods are commonly applied for modeling extremes of natural phenomena. We propose and examine empirical likelihood (EL) inference methods for PWMs. This approach extends the classical EL technique for evaluating usual moments, including the population mean. We provide an asymptotic proposition, extending a well-known nonparametric version of Wilks’ theorem used to evaluate the Type I error rates of EL ratio tests. This result is applied in order to develop a powerful nonparametric EL ratio test and the corresponding distribution-free CI estimation for PWMs. We show that the proposed method can be easily applied towards inference of the Gini index, a widely used measure for assessing distributional inequality. An extensive MC study shows that the proposed technique provides accurate Type I error rate control, as well as very accurate CI estimation. Our approach outperforms the CI estimation based on the classical schemes to analyze the PWMs. These results are clearly observed in the cases when underlying data are skewed and/or consist of a relatively small number of data points. A real data example of myocardial infarction disease is used to illustrate the applicability of the proposed method. In Chapter 4 we develop EL ratio tests with power one in the sequential test setting. In the 1970s, Professor Robbins and his coauthors extended the Vile and Wald inequality in order to derive the fundamental theoretical results regarding likelihood ratio based sequential tests with power one. The law of the iterated logarithm confirms an optimal property of power one tests. In parallel with Robbins’s decision-making procedures, we propose and examine sequential empirical likelihood ratio (ELR) tests with power one. In this setting, we develop nonparametric one- and two-sided ELR tests. It turns out that the proposed sequential ELR tests significantly outperform the classical nonparametric t-statistic-based counterparts in many scenarios based on different underlying data distributions.Chapter 5 concerns the detection of treatment effects in the sequential setting. In health-related experiments, treatment effects can be identified using paired data that consist of pre- and post-treatment measurements. In this framework, sequential testing strategies are widely accepted statistical tools in practice. Since performances of parametric sequential testing procedures critically depend on the validity of the parametric assumptions regarding underlying data distributions, we focus on distribution-free mechanisms for sequentially evaluating treatment effects. In retrospective studies, the density-based empirical likelihood (DBEL) methods provide powerful nonparametric approximations to optimal Neyman-Pearson type statistics. We extend the DBEL methodology to develop a novel sequential DBEL testing procedure for detecting treatment effects based on paired data. The asymptotic consistency of the proposed test is shown. An extensive MC study confirms that the proposed test outperforms the conventional sequential Wilcoxon signed-rank test across a variety of alternatives. The excellent applicability of the proposed method is exemplified in the context of the Ventilator-Associated Pneumonia Study that evaluates the effect of Chlorhexidine Gluconate treatment in reducing oral colonization by pathogens in ventilated patients.