Statistical Genetics Seminar, Winter 2020

Instructor: Ellen Wijsman (wijsman@uw.edu). Also, Liz Blue, Sharon Browning, and Tim Thornton, .

This quarter’s topic

The topic for this quarter is Haplotypes derived from genotype data. It does not appear that we will have research talks from students/postdocs this quarter.

Why are haplotypes important, and how to we obtain them? There are many reasons why one might wish to define, determine or infer haplotypes (and their frequencies). Haplotypes consist of the particular allelic states at a series of defined loci on a particular chromosome. Haplotypes provide the material needed to determine IBD between individuals (Fall, 2019 Statgen topic). They provide "context" for disease mutations or variants, which always occur on an initial chromosome. These chromosomes then gets further shuffled through recombination over time, but retain correlation across the markers for a long time. Haplotypes defined by multiple loci create "super-alleles" that increase information in linkage or association analysis. Finally, the diversity and particular patterns provided by haplotypes provides exquisite detail regarding ancestry, from which inferences about past population migrations and origins can be obtained. While haplotypes can be determined over short genomic distances by laboratory methods, to a large extent haplotyping involves inference, either from family structure information coupled with genotype data, and/or from samples of (putatively) unrelated subjects.

Why haplotype inference and use now? Both the marker density, and the number of subjects in genetic data sets are increasing. Where once it was stunning to define half a dozen polymorphic markers in a region of interest, now we have whole genome sequence data. Where once a sample of 100 subjects was large, now we have samples from tens or hundreds of thousands of individuals. Where once we were interested in Mendelian disorders, now we are neck-deep in complex disorders. Where once we had data only on living (or recently deceased) individuals, now we also have data on subject from archeaological samples. There are new approaches that wouldn’t have worked with the data available 20-30 years ago, and new questions that we can now ask, as well as a depth of resolution that is possible today.

An overview of this quarter’s papers: The papers for January 14 & 28 illustrate the importance of haplotype context for answering questions about disease mutations. The papers for Jan. 14 are old, and address one of the first recessive human disorders with a known location (but not a known sequence at the time!). They use the kind of DNA markers that were first available, while papers for Jan. 28 postdate the release of the human genome sequence, are more recent and use multiallelic microsatellite markers along with current standard SNP markers. The Jan. 14 & 21 papers illustrate the original approach to determining haplotypes - from pedigree data. Then (Feb. 4) we switch to obtaining haplotypes (or haplotype frequencies) from population samples, using one of the simplest methods, with a second papers That makes inferences about populations and their histories by comparing haplotype frequencies. Feb. 11 we consider use of haplotypes for association testing, and Feb. 18 we look at haplotypes in the context of geographic structure and disease mutations. Then we will look at a much more sophisticated approach to population-base haplotype modeling and inference of population structure (Feb. 25 and Mar 3). Finally, we will finish with one (or more) example of use of this model to make inference(s) about population history.

Expectations

· You will read the paper prior to the seminar each week, show up to the seminar (I’ll be taking attendance) and contribute to the discussion with questions and comments.

· You (usually with someone else) will present the paper one week during the quarter.

Schedule

Jan 7: Organizational meeting

Jan 14: Presenter: Jacob.
Lidsky et al. 1985. Extensive restriction site polymorphism at the human phenylalanine-hydroxylase locus and application in prenatal-diagnosis of phenylketonuria. American Journal of Human Genetics , 37(4):619-634. pdf
Daiger et al. 1986. Polymorphic DNA haplotypes at the phenylalanine-hydroxylase locus in prenatal-diagnosis of phenylketonuria. Lancet, 1(8475):229-232. pdf
Primarily for reference/context: Dilella et al. 1986. Molecular-structure and polymorphic map of the human phenylalanine-hydroxylase gene. Biochemistry, 25(4):743-749. pdf
Comments: These are old papers. Do not spend time on the molecular methods. RFLPs can be thought of as a different technology to genotype SNPs than we use today, with the two resulting numbers or sets of numbers for an RFLP mapping onto the two alleles for a SNP. Focus on the big picture. There are some useful comments in the introduction and discussion of the Delilla et al paper which is mostly provided for context and the gene region figures.

Jan 21: Presenter: Cameron
Roach et al. 2011. Chromosomal haplotypes by genetic phasing of human families. American Journal of Human Genetics , 89(3):382-397. main methods paper pdf

Roach et al. 2010. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science, 328(5978):636-639. application pdf
Comments: These are newer papers that grapple with the extremely dense data that we now have to first identify haplotypes, and then identify "causal" variants within those haplotypes. (The details used to propose a causal variant, beyond the identification of the haplotypes that must carry the variant, are not important here)

Jan 28: Presenter: Nandana
Zhang et al. 2009. Evidence for an ancient BRCA1 mutation in breast cancer patients of Yoruban ancestry. Familial Cancer, 8(1):15-22 pdf.
Frishberg et al. 2005. Identification of a recurrent mutation in GALNT3 demonstrates that hyperostosis-hyperphosphatemia syndrome and familial tumoral calcinosis are allelic disorders. Journal of Molecular Medicine , 83(1):33-38. pdf.
Comments: We return to moderately older papers to use haplotypes to learn something of the histories of underlying mutations and the relationship between mutations and phenotypes. There are many examples, and these are short and simple.

Feb 4: Presenter: Charlie.
Excoffier and Slatkin 1995. Maximum-likelihood-estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12(5):921-927. pdf
Comment: Here we add a simple approach to estimate haplotype frequencies. The EM approach discussed by Excoffier & Slatkin is an early approach that can be used with unrelated subjects with unphased haplotypes that aren't too long.

Feb. 11: Presenter: Ning
Roses et al. 2014. African-American TOMM40'523-APOE haplotypes are admixture of West African and Caucasian alleles. Alzheimers & Dementia, 10(6):592-601. pdf
Comment:Haplotype frequencies are used here to make an inference about origins of haplotypes in an admixed population.

Feb 18: Presenter: Ruoyi
Eiken et al. 1996. Relative frequency, heterogeneity and geographic clustering of PKU mutations in Norway. European Journal of Human Genetics, 4(4):205-213. pdf
Sarantaus et al 2000 Multiple founder effects and geographical clustering of BRCA1 and BRCA2 families in Finland. European Journal of Human Genetics, 8(10):757-763. pdf
Blay et al. 2013. Mutational analysis of BRCA1 and BRCA2 in hereditary breast and ovarian cancer families from Asturias (Northern Spain). BMC Cancer, 13. pdf
Comment:Early on, the separate PKU mutations were not readily associated with any particular markers. This changed with haplotype construction, showing the genographic structure associated with some of the mutations. After PKU (and a few other) examples, haplotyping became common, in part to determine whether an individual might have an already-known mutation, because sequencing was more expensive than genotyping a few markers.

Feb 25: Presenter: Michael
Falush et al. 2003 Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164(4):1567-1587. pdf
Comment:This paper is an extension of an earlier paper, here allowing for correlated allele frequencies along the chromosome (because of markers in LD that define the haplotypes), and provides the basis for the next paper (Lawson et al), to estimate ancestral origins along the chromosome as a consequence of the haplotypes. We are particularly interested in the admixture inference aspect.

Mar 3: Presenter: Seth
Lawson et al. 2012 Inference of population structure using dense haplotype data. PLOS Genetics, 8(1). pdf
Comment:Here we are introduced to ChromPainter, which captures ancestral haplotypes for admixture estimation.

Mar 10: Presenter: Alan
Chacon-Duque et al. 2018. Latin AMericans show wide-spread Converso ancestry and imprint of local Native ancestry on physical appearance. Nature Communications, 9(1):5388. pdf

Extra reading

Here are a few representative papers for extra reading:

Lin DY. 2004. Haplotype-based association analysis in cohort studies of unrelated individuals. Genetic Epidemiology, 26(4):255-26 pdf

Niu et al 2004. Algorithms for inferring haplotypes. Genetic Epidemiology, 27:334-347. pdf

Other papers with similar approaches to ancestral origins through genomic haplotypes, but that will not be discussed
Hellenthal et al. 2014. A genetic atlas of human admixture history. Science , 343(6172):747-751.
Leslie et al 2015. The fine-scale genetic structure of the British population. Nature , 519(7543):309.
Busby et al. 2015. The role of recent admixture in forming the contemporary west Eurasian genomic landscape. Curr. Biol. , 25(19):2518-2526.
Hofmanova et al.2016. Early farmers from across Europe directly descended from Neolithic Aegeans. PNAS , 113(25):6886-6891.
Kerminen et al 2017. Fine-scale genetic structure in Finland. G3-Genes Genomes Genetics, 7(10):3459-3468.

Presentation guidelines

· You and your co-presenter(s) will work together to understand the paper and prepare a presentation. If you need help, you can contact one of the instructors, usually Ellen. It is okay to state during the presentation that you didn’t understand some points.

· You should aim for around six slides. This helps you to focus the discussion on the main points rather than all the details, and to allow time for interaction with the whole group. If the paper contains a lot of material, be selective and don’t present it all.

· Focus on the big picture. Forolder papers, what was the state of the data or methods at the time? What was understood or still open to question? What was this paper trying to achieve? Why was this problem important? Was the paper successful in addressing the problem? What were the exciting aspects of this work? What caveats or problems are there? Don’t spend much time on the details of the methods or an exhaustive examination of the results.

· Present standing, if possible, looking at the audience and not at your own computer screen. Point to the large screen, especially when there are important details on figures from the paper. Both presenters should be standing (if possible) and both engaged and participating throughout the presentation. Interaction between the two of you helps to encourage participation by the whole group.