Weighted logistic regression model for genetic association studies in admixed populations
Abstract
Candidate gene association approach is used to investigate contribution of polymorphisms in specific candidate genes for which there is evidence of possible role in disease susceptibility. Although a case-control study is very popular due to its efficiency and ease of recruiting subjects, it is however prone to spurious associations often not replicated in subsequent independent studies. Failure of replication is often blamed on confounding due to population stratification which results from existence of genetically different groups in study population and when these subgroups also differ in their baseline risk of the disease under study. In this study, a new analytical method using individual admixture to weight individual likelihood contribution in logistic model for controlling population stratification was developed. This method can be applied in admixed or structured populations. Extensive Monte Carlo simulations were carried out to assess both the Type I Error rate and the power of the new method. Results showed that the new method had similar error rate to the Genomic Control method and also more power to detect association when the disease variant has different risks in subpopulations. The new method was also applied to the population based African American and European American samples for studying hypertension where 45 selected loci across 5 candidate genes from the Renin-Angiotensin System were genotyped. Results from both the single nucleotide polymorphism-based and haplotype-based analyses showed that the new method is not susceptible to spurious associations. A major contribution of this study to the fields of genetic epidemiology and statistical genetics is the development and demonstration of a more powerful analytical approach for genetic association studies in admixed populations. The method accounts for population stratification and can also adjust for other covariates within a single model. The advantages of the proposed method include: (1) the ability to obtain detailed and specific definition of ancestry based on information derived from genomic data and not self-reported ethnic origin; (2) the fact that ancestry is clearly defined allows for a more precise evaluation of the role of genetic variations in the aetiology of diseases; (3) the ability to use ancestry information to estimate ancestral population-specific association effect sizes in the aetiology of diseases; (4) the reliance on ancestral informative markers for the identification of ancestry allows to pool data from studies in different admixed populations; (5) the increased statistical power due to increased sample size that comes with pooling data from different studies; (6) the ability to perform covariates adjustment specific to each ancestral population within a single model; and (7) the easy of interpretation of its estimates within the context of association studies.