Statistical Inference on Alternative Diagnostic AccuracyMeasures: Tackling Challenges inMultiple-Categorical Genomic Data
A diagnostic test is a medical test performed for the detection, classification, or prediction of a disease. The test could be a screen test, drug test, or genomic biomarkers examination. The treatment decision made by physicians is majorly based results from various diagnostic test. Therefore, the accuracy of a diagnostic test is important in patient care. When a new diagnostic test is introduced, it is essential to evaluate the accuracy and quality of the test, and compare with existing tests or the reference standards. In the last decade, there have been explosive amount of researches on disease diagnostic tests, especially cancer biomarkers, being published in literature. However there would be only a few to be used in clinical examine, due the limit of cost and time. It is important to select the most reliable and accurate test, based on some statistical diagnostic accuracy measures, to aid the physicians to make treatment decisions. The purpose of diagnostic accuracy measure is to estimate and compare the accuracy of the diagnostic tests, such as biomarkers, to provide reliable information about a patient's disease status and thereby enables better decision making on the use of a therapy. In general, the diagnostic accuracy measure quantify the accuracy of a classifier, and it should be able to answer below questions: Is the diagnostic test useful in clinical practice or replacing existing clinical test? Is the diagnostic test able to improve the classification accuracy of patients? How much is the improvement? Meanwhile, the diagnostic accuracy measure should be a conceptually simple and concise for physicians to interpret. Currently, there are many widely used diagnostic accuracy measures available, including sensitivity and specificity, likelihood ratios, diagnostic odds ratio, positive predictive value, negative predictive value and receiver operating characteristic (ROC) curve. The most popular one among them is ROC which is especially the favorite for binary cases. It has a number of advantages, including simple and straight forward graphical view, independence on disease prevalence, and invariance to monotonic transformations. Moreover, its summary measurement, area under the curve (AUC), can be easily used to quantify the performance of a classifier. The most common methodology for assessing diagnostic test is based on the assumption of a dichotomous true disease status: diseased and no diseased. In practice, many diseases, especially cancer, can be classified into multiple classes. Due to the complexity of cancer cells, the traditional "same diagnosis, same prescription'' strategy is not optimal, as one treatment is beneficial a group of patients, but it might have reverse effect on other group of patients. These two groups of patients might have same conventional diagnostic tests which based on morphological appearance of tissue sample. For example, cancer patients having similar histo-pathological appearance (phenotype) could have very different drug response. Some of the differences could be explained by the genetic triats of the patients or tumors. In the era of personalized medicine, cancer patients are classified into different groups not only based on tumor morphology but also by genomic biomarkers, and the improved classification will lead to treatment strategy optimized for specific group to reach better efficacy. For example, breast cancer can be divided into to four major molecular subtypes, luminal A, luminal B, triple negative/basal-like, and HER2 type, based on the combination of four biomarkers (Table 1): Lumina A tumors have the best prognosis with hormone therapy compared to other subtypes of tumors. Triple negative/basal like tumors are often aggressive and do not respond to hormone therapies as much as the luminal A/B tumors. The use of high-throughput platforms such as microarray and next generation sequencing enable the researcher to examine patient's genome profile accurately and quickly. With more and more genomic data accumulated, one type of tumors can be classified into multiple subtypes, based on genomic biomarkers, such as gene mutations and altered gene expressions. These subtypes could not be defined by traditional methods. Nowadays, most major cancer phenotypes have been divided into more than two subtypes. This more advanced classification introduces challenges in choosing diagnostic accuracy measures. The diagnostic accuracy measures for multiple classes is more difficult to construct than that for binary classes, because of the increased number of separation boundaries or relationships. In binary classes, a decision boundary can be make for only one of the classes; the other class is the complement. For multiple classes, the decision boundaries are more dynamic, depending on the study design. In clinical trials of personalized medicine, multiple groups of patients are often regrouped into response (R) versus non-response (NR) group (shorten as RVNR) based on the treatment. That is to say, we are often to consider multiple classes in a binary classification fashion for the prediction of cancer treatment output. RVNR could be one of the following scenarios: 1) one specific response class versus all the other non-response classes; 2) one non-response class versus all responses; 3) several response classes versus other non-response classes. In scenario 1) and 2), the aim is to identifying biomarkers to differentiate one class from other classes without requiring a specific ordering in the other. In scenario 3), the biomarker could be a mixture distributed with latent classes in either R or NR group. There is no standardized guidelines existing for the statistical diagnostic accuracy measure appropriate for the aforementioned multi-classes problem. In this thesis, we presented two diagnostic accuracy measures that can tackle the challenges introduced by multi-classes. The main idea of them are introducing more sophisticated manners to re-classify the multi-classes into the combining binary classes. TROC methodology is proposed to reduce the multiple classes problem into a two-class problem under tree ordering (defined in section 1.1.2). Overlapping coefficient (OVL) is extended for the data with disease containing a mixture of unidentified subtypes. The inference approaches are proposed for both TROC and OVL and compared in the simulation studies. The proposed methods are then used to evaluate the diagnostic accuracy measures of selected biomarkers with published genomic data. Compared to existing diagnostic accuracy measures, our proposed diagnostic measures are less complex and more straightforward to visualize and interpret. They represent additional information on the associated underlying relationship of K-classes, and can be used to improve the effectiveness of diagnostic accuracy measures required in genomic study.