Biostat 581. Statistics Genetics Seminar. Fall 2016.

Instructor: Sharon Browning. sguy@uw.edu

Topic: Genetic big data: What can you do with genome-wide SNP data on over 100,000 individuals? UK Biobank as a case study.

About the UK Biobank: The UK Biobank has over 500,000 participants. Participants were residing in the UK, and had ages 40-69, when recruited in 2006-2010. These individuals have been extensively phenotyped. Genome-wide SNP array genotypes are now available on over 150,000 individuals. Genotypes on the remaining individuals are expected to be available early next year.

About some other very big genetic data sets:

·         The Million Veterans Program has currently enrolled more than 345,000 veterans, and genotyping of the first 200,000 is complete.

·         The GERA (Genetic Epidemiology Research on Adult Health and Aging) study genotyped around 78,000 adults who are members of the Kaiser Permanente Medical Care Plan in the Northern California Region.

What we hope to gain from this selection of papers:

·         Learn about state of the art analyses for large genome-wide SNP array data.

·         Learn about challenges and opportunities in analysis of extremely large genetic data sets.

·         Survey the variety of analyses that can be performed, including, but extending beyond, association tests.

Background paper with additional information about the UK Biobank:

Sudlow et al. 2015. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Medicine. http://dx.doi.org/10.1371/journal.pmed.1001779

 

10/4: Initial meeting. Welcome, introductions, organization.

 

10/11: (Kelsey and Tracy with advising from Andrew) Wain et al. 2015 Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank. Lancet Respiratory Medicine. http://dx.doi.org/10.1016/S2213-2600(15)00283-0

A nested case-control association study. The first UK Biobank genetic study.

 

10/18: (Bowen and Fiona with advising from Tim) Davies et al. 2016 Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112 151). Molecular Psychiatry. http://dx.doi.org/10.1038/mp.2016.45

A genome-wide association study, plus heritability analysis (heritability based on genetic data rather than on familial relationships) for cognitive function traits.

 

10/25: Jenn Kirk presenting to fulfil her statgen certificate requirement.

 

11/1: ASHG/IGES reports. Those who attending the American Society of Human Genetics annual meeting and/or International Genetic Epidemiology Society annual meeting will report back on what they learned.

 

11/8: (Tyler and Qian with advising from Sharon) Hagenaars et al. 2016. Shared genetic aetiology between cognitive functions and physical and mental health in UK Biobank (N=112 151) and 24 GWAS consortia. Molecular Psychiatry. http://dx.doi.org/10.1038/mp.2015.225

Investigates pleiotropy, i.e. genetic variants that affect more than one trait.

 

11/15: (Madeleine, Anya and Nan with advising from Bruce) Kendall et al. 2016 Cognitive performance among carriers of pathogenic copy number variants: Analysis of 152,000 UK Biobank subjects. Biological Psychiatry. http://dx.doi.org/10.1016/j.biopsych.2016.08.014

Neurodevelopmental copy number variants and cognitive performance.

 

11/22: (Amarise and Xiaowen with advising from Liz) Tyrell et al. 2016 Height, body mass index, and socioeconomic status: Mendelian randomisation study in UK Biobank. BMJ. doi: http://dx.doi.org/10.1136/bmj.i582   

Mendelian randomization. (“Mendelian randomization is a method of using measured variation in genes of known function to examine the causal effect of a modifiable exposure on disease in non-experimental studies” – Wikipedia.)

 

11/29: (Yalan and Aaron with advising from Ellen) Loh et al. 2016 Fast and accurate long-range phasing in a UK Biobank cohort. Nature Genetics. http://dx.doi.org/10.1038/ng.3571

Haplotype phasing is computationally challenging in such a large data set, yet the size of the data also results in more long segments of identity by descent which can be leveraged for high accuracy phasing.

 

12/6: (Cameron and Alice with advising from Liz) Galinsky et al. 2016 Population structure of UK Biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure. AJHG. http://dx.doi.org/10.1016/j.ajhg.2016.09.014   

Population structure and selection.