Ruth Huang Miller, Ph.D.
Year in Program: Completed First Year, June 2012
Donald F. Conrad, Ph.D.
Department of Genetics
Washington University School of Medicine
For the past 10 years, human population geneticists have used a single estimate of the human point mutation rate for their simulations and calculations (2.5 x 10^-8 per bp/per generation, Nachman and Crowell 2000). Previous work has suggested that there is likely to be extensive variation in mutation rate among individuals ; the properties of this variation await further characterization. Recently, de novo mutation has emerged as an extremely important risk factor for a variety of neuropsychiatric diseases such as autism, schizophrenia, and epilepsy. For instance, a recent study identified de novo CNVs in 7.9% of children affected with autism but 2% of control children . These findings suggest de novo mutation may play an important role in other psychiatric disease such as addiction.
SNP arrays can be used as a tool for discovering de novo CNVs, but few formal statistical methods have been developed for this purpose. This project proposes to adapt our existing statistical framework for identifying de novo point mutations from sequencing data , to the task of identifying de novo deletions and duplications from array data and their contribution to addiction phenotypes. Briefly, we will implement a method to jointly analyze the raw intensity data from array experiments on a parent-offspring trio to quantify the probability that a given CNV is de novo. The program will work from a candidate list of de novo CNVs generated by applying standard CNV discovery to the offspring of each family. We will use standard Gaussian mixture models for modeling the probe signal intensity as a function of copy number. Further, we will incorporate the parent of origin in our model for de novo mutation, which will provide haplotype phase in many cases. Finally, we will incorporate population genetic data on SNPs (for instance, allele frequencies from the 1000 genomes project) to identify and exploit unusual patterns of SNP genotypes anticipated within CNV regions due to genotyping error (for instance runs of homozygosity in spanning deletions).
Ultimately we would like our package to address the following analysis goals:
1. Test for differences in mutation rate among individuals, among sexes, and across covariates that are available (eg age). Test for differences in mutation rate across classes of CNV (duplication/deletion, genomic context, size, etc).
2. Describe the distribution of CNV mutation rate across families
3. Test for evidence of selective constraint on the number of denovo large CNVs per genome that are compatible with live birth.
The software developed here will be of broad use to researchers interested in the biology of human mutation, and could also be used to reveal new genetic features underlying addiction phenotypes. Methods developed here will be implemented in an existing software package for identifying de novo mutations from sequencing and array data, denovogear (http://sourceforge.net/p/denovogear).
 Conrad, et al 2011, "Variation in genome-wide mutation rates within and between human families"