Statistical Genetics Seminar, Fall 2019

Instructor: Sharon Browning (sguy@uw.edu). Also, Liz Blue and Ellen Wijsman.

This quarter’s topic

The topic for this quarter is relationship inference from genotype data. In addition, we will have several research talks from students/postdocs.

Why relationship inference? There are many reasons why one might wish to infer relationships between individuals based on their genetic data. Some genetic analyses depend on correctly knowing the family structure (for example, linkage analysis and transmission disequilibrium testing). In an association analysis, related individuals are non-independent, and one should account for their correlation, for example by removing relatives, or by using a mixed model. Direct-to-consumer genetic testing companies help you find your relatives, so you can make contact or build your family tree. The FBI wants to find the identity of the donor of a DNA sample from a crime scene by finding one of their relatives in a large database (see Erlich et al. 2019, below).

Why relationship inference now? Currently genetic data sets are getting larger – tens or hundreds of thousands of individuals, and SNP arrays or whole genome sequence data with millions of genetic markers. Thus, one new challenge is computational efficiency. Also, with such highly dense genetic data, there are new approaches that wouldn’t have worked with the data available 20 years ago. Finally, current genetic data sets are increasingly heterogeneous, and population structure must be considered – early methods for relationship inference assumed that all individuals came from a single, homogeneous population.

An overview of this quarter’s papers: Lynch and Ritland (1999) is a classic paper based on treating genetic markers as independent, and assuming a homogeneous population. There are many other such papers, some based on likelihood or method of moments; this one uses regression. Manichaikul et al. (2010) presented a method called KING for robust estimation in a heterogeneous population. KING also treats genetic markers as independent. See also Thornton et al. 2012 and Conomos et al. 2016 under “Extra reading”. Huff et al. (2011) introduce a method, ERSA, that uses inferred segments of identity by descent (IBD). This type of approach makes use of dependencies between genetic markers for improved accuracy, and is quite robust to population heterogeneity. Whereas ERSA uses maximum likelihood on the IBD segment lengths, Ramstetter et al. (2017) point out you can also do well just summing the IBD segment lengths to obtain an estimate of relatedness. Ramstetter et al. also provide a comparison of methods, including KING, ERSA, REAP (Thornton et al. 2012), PC-RELATE (Conomos et al. 2016), and a variety of IBD segment detection methods. Gusev et al. (2009) is a classic paper with an IBD-segment detection method called GERMLINE, with applications including relationship inference. GERMLINE scales very well to large data sets. Finally, we end with a forensic application (Erlich et al. 2018).

Expectations

· You will read the paper prior to the seminar each week, show up to the seminar (I’ll be taking attendance) and contribute to the discussion with questions and comments.

· You (usually with someone else) will present the paper one week during the quarter.

Schedule

Oct 1: Organizational meeting

Oct 8: Lynch, M. and Ritland, K., 1999. Estimation of pairwise relatedness with molecular markers. Genetics, 152(4), pp.1753-1766. https://www.genetics.org/content/genetics/152/4/1753.full.pdf

Alan, Danni

Oct 15: (Sharon, Liz, and Ellen will be away at ASHG) Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M. and Chen, W.M., 2010. Robust relationship inference in genome-wide association studies. Bioinformatics, 26(22), pp.2867-2873. https://academic.oup.com/bioinformatics/article/26/22/2867/228512

Jacob, Charlie

Oct 22: (ASHG/IGES highlights) Huff, C.D., Witherspoon, D.J., Simonson, T.S., Xing, J., Watkins, W.S., Zhang, Y., Tuohy, T.M., Neklason, D.W., Burt, R.W., Guthery, S.L. and Woodward, S.R., 2011. Maximum-likelihood estimation of recent shared ancestry (ERSA). Genome research, 21(5), pp.768-774. https://genome.cshlp.org/content/21/5/768.full

Iris, Hanley

Oct 29: Ramstetter, M.D., Dyer, T.D., Lehman, D.M., Curran, J.E., Duggirala, R., Blangero, J., Mezey, J.G. and Williams, A.L., 2017. Benchmarking relatedness inference methods with genome-wide data from thousands of relatives. Genetics, 207(1), pp.75-82. https://www.genetics.org/content/genetics/207/1/75.full.pdf

Ruoyi, Yu, Yunbi

Nov 5: Michael Goldberg, practice general exam

Nov 12: Gusev et al. 2009 Whole population, genome-wide mapping of hidden relatedness Genome Research 19:318-326

https://genome.cshlp.org/content/19/2/318.full

Seth, Manisha, Cameron

Nov 19: Erlich, Y., Shor, T., Pe’er, I. and Carmi, S., 2018. Identity inference of genomic data using long-range familial searches. Science, 362(6415), pp.690-694. https://science.sciencemag.org/content/362/6415/690

Nandana, Kamalam

Nov 26: Xiaowen Tian

Dec 3: Edward Zhao and Ying Zhou

Extra reading

The list above intentionally omits many important relevant papers authored by members of the UW statgen community. You’ll see a number of them referenced by the papers we read throughout the quarter. Here are a few representative papers for extra reading:

Thompson, E. A., 1975 The estimation of pairwise relationships. Ann. Hum. Genet. 39: 173–188 (unfortunately UW doesn’t have online access for this one)

Browning, S., 1998. Relationship information contained in gamete identity by descent data. Journal of Computational Biology, 5(2), pp.323-334. https://www.liebertpub.com/doi/abs/10.1089/cmb.1998.5.323

Sieberts, S.K., Wijsman, E.M. and Thompson, E.A., 2002. Relationship inference from trios of individuals, in the presence of typing error. The American Journal of Human Genetics, 70(1), pp.170-180. https://www.sciencedirect.com/science/article/pii/S0002929707612919

Browning, B.L. and Browning, S.R., 2011. A fast, powerful method for detecting identity by descent. The American Journal of Human Genetics, 88(2), pp.173-182. https://www.sciencedirect.com/science/article/pii/S0002929711000115

Thornton, T., Tang, H., Hoffmann, T.J., Ochs-Balcom, H.M., Caan, B.J. and Risch, N., 2012. Estimating kinship in admixed populations. The American Journal of Human Genetics, 91(1), pp.122-138.https://www.sciencedirect.com/science/article/pii/S0002929712003096

Conomos, M.P., Reiner, A.P., Weir, B.S. and Thornton, T.A., 2016. Model-free estimation of recent genetic relatedness. The American Journal of Human Genetics, 98(1), pp.127-148. https://www.sciencedirect.com/science/article/pii/S0002929715004930

Presentation guidelines

· You and your co-presenter(s) will work together to understand the paper and prepare a presentation. If you need help, you can contact one of the instructors, usually Sharon. It is okay to state during the presentation that you didn’t understand some points.

· You should aim for around six slides. This helps you to focus the discussion on the main points rather than all the details, and to allow time for interaction with the whole group. If the paper contains a lot of material, be selective and don’t present it all.

· Focus on the big picture. What was this paper trying to achieve? Why was this problem important? Was the paper successful in addressing the problem? What were the exciting aspects of this work? What caveats or problems are there? Don’t spend much time on the details of the methods or an exhaustive examination of the results.

· Present standing, if possible, looking at the audience and not at your own computer screen. Point to the large screen, especially when there are important details on figures from the paper. Both presenters should be standing (if possible) and both engaged and participating throughout the presentation. Interaction between the two of you helps to encourage participation by the whole group.