Biostat 581. Statistics Genetics Seminar. Winter 2017.
Instructor: Liz Blue. em27@uw.edu
Topic: Batch effects and convenience controls.
What we hope to gain from this selection of papers:
·
Recognize the consequences of systematic differences between cases and controls outside the trait of interest.
·
Evaluate analytical approaches to avoiding or minimizing those differences.
1/3: Initial
meeting. Welcome, introductions, organization.
1/10: (Amarise Little and Tyler Bonnett with advising from Tim Thornton) WTCCC (2007) "Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls." Nature 447(7145):661-78.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2719288/
One of the early large consortia papers which uses a set of shared controls for multiple phenotypes. Revealed variable batch effects across studies.
1/17: (Alice Popejoy with advising from Liz Blue) Yu et al. (2008) "Population substructure and control selection in genome-wide association studies" PLoS One 3(7): e2551.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432498/
Demonstrates how principal components can help control batch effects and population structure.
1/24: (Anya Mikhaylova and Andrew Nato with advising from Sharon Browning) Miclaus et al. (2010) "Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies" The Pharmacogenomics Journal 10:324-335.
https://www.ncbi.nlm.nih.gov/pubmed/20676070/
A special consortia of SNPchip analysts show how the algorthms used to call genotypes influence association testing results.
1/31: (Bowen Wang and Cameron Haas with advising from Ellen Wijsman) Lee et al. (2010) "A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies." Genet Epidemiol 34(8):854-62.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3674525/
Designed for SNPchip analyses, compares association testing results from individual SNPs to those from pairs of SNPs.
2/7: (Aaron Baraff and Aaron Hudson with advising from Liz Blue) Derkach et al. (2014) "Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic." Bioinformatics 30(15):2179-88.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103600/
Designed for next-generation sequencing data, introduces a likelihood-based statistic that uses the expected genotype probability given the observed data.
2/14: (Qian Zhang with advising from Liz Blue) Yan et al. (2015) "Likelihood-based complex trait association testing for arbitrary depth sequencing data." Bioinformatics 31(18):2955-62.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4668777/
Another likelihood-based statistic that avoids explicitly calling genotypes from next-generation sequencing data.
2/21: (Yalan Xing and Rafael Nafikov with advising from Liz Blue) Hu et al. (2016) "Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls". PLoS Genetics 12(5):e1006040.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4859496/
This likelihood-based statistic recognizes that different read depths in different data sets influences the probability of observing the alternate allele.
2/28: (Xiaowen Tian and Marsha Wheeler with advising from Liz Blue) Kan et al. (2016) "Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project" Eur J Hum Genet 24(8):1181-7.
https://www.ncbi.nlm.nih.gov/pubmed/26757982/
This GWAS combines old exome data with a complex indicator variable to minimize batch effects.
3/7: (Kelsey Grinde with advising from Liz Blue) Lee S, Kim S, and Fuchsberger C. (preprint) "Improving power for rare variant tests by integrating external controls".
http://biorxiv.org/content/early/2016/10/18/081711
This paper uses odds ratios derived from internal controls and those derived from internal+external controls to control type I error.