Biostat 581. Statistics Genetics Seminar. Winter 2017.

Instructor: Liz Blue. em27@uw.edu

Topic: Batch effects and convenience controls.

What we hope to gain from this selection of papers:

·         Recognize the consequences of systematic differences between cases and controls outside the trait of interest.

·         Evaluate analytical approaches to avoiding or minimizing those differences.

 

1/3: Initial meeting. Welcome, introductions, organization.

 

1/10: (Amarise Little and Tyler Bonnett with advising from Tim Thornton) WTCCC (2007) "Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls." Nature 447(7145):661-78. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2719288/

One of the early large consortia papers which uses a set of shared controls for multiple phenotypes. Revealed variable batch effects across studies.

 

1/17: (Alice Popejoy with advising from Liz Blue) Yu et al. (2008) "Population substructure and control selection in genome-wide association studies" PLoS One 3(7): e2551. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432498/

Demonstrates how principal components can help control batch effects and population structure.

 

1/24: (Anya Mikhaylova and Andrew Nato with advising from Sharon Browning) Miclaus et al. (2010) "Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies" The Pharmacogenomics Journal 10:324-335. https://www.ncbi.nlm.nih.gov/pubmed/20676070/

A special consortia of SNPchip analysts show how the algorthms used to call genotypes influence association testing results.

 

1/31: (Bowen Wang and Cameron Haas with advising from Ellen Wijsman) Lee et al. (2010) "A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies." Genet Epidemiol 34(8):854-62. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3674525/

Designed for SNPchip analyses, compares association testing results from individual SNPs to those from pairs of SNPs.

 

2/7: (Aaron Baraff and Aaron Hudson with advising from Liz Blue) Derkach et al. (2014) "Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic." Bioinformatics 30(15):2179-88. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103600/

Designed for next-generation sequencing data, introduces a likelihood-based statistic that uses the expected genotype probability given the observed data.

 

2/14: (Qian Zhang with advising from Liz Blue) Yan et al. (2015) "Likelihood-based complex trait association testing for arbitrary depth sequencing data." Bioinformatics 31(18):2955-62. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4668777/

Another likelihood-based statistic that avoids explicitly calling genotypes from next-generation sequencing data.

 

2/21: (Yalan Xing and Rafael Nafikov with advising from Liz Blue) Hu et al. (2016) "Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls". PLoS Genetics 12(5):e1006040. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4859496/

This likelihood-based statistic recognizes that different read depths in different data sets influences the probability of observing the alternate allele.

 

2/28: (Xiaowen Tian and Marsha Wheeler with advising from Liz Blue) Kan et al. (2016) "Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project" Eur J Hum Genet 24(8):1181-7. https://www.ncbi.nlm.nih.gov/pubmed/26757982/

This GWAS combines old exome data with a complex indicator variable to minimize batch effects.

 

3/7: (Kelsey Grinde with advising from Liz Blue) Lee S, Kim S, and Fuchsberger C. (preprint) "Improving power for rare variant tests by integrating external controls". http://biorxiv.org/content/early/2016/10/18/081711

This paper uses odds ratios derived from internal controls and those derived from internal+external controls to control type I error.