![]() |
Michael Brudno,
|
High throughput sequencing (HTS) technologies have enabled the inexpensive sequencing of human genomes, and the discovery of some genomic variants from the resulting short read datasets is well underway. In this talk I will present algorithms for discovery of two types of variants from HTS data: smaller indels (<50bp) and copy number variants (CNVs). First, I will describe MoDIL: Mixture of Distributions Indel Locator, a novel method for finding insertion/deletion polymorphisms from paired short reads. We explicitly model each genomic locus as a mixture of two haplotypes, and our method takes advantage of the high clone coverage to identify both homozygous and heterozygous variation, even if the individual clone sizes are unreliable. Analysis of a recently sequenced genome demonstrates that MoDIL accurately identifies indels >= 20 nucleotides. I will then describe a method to predict CNVs from paired short reads. Our method combines information from paired short reads to identify variable regions and depth-of-coverage to predict the true copy count in the donor genome. Together, the two datasets help overcome both sequencing biases of HTS platforms and spurious read mappings. Our method allows for the detection of CNVs within segmental duplications. We use our method to detect CNVs within the same dataset, and make a total of ~5000 calls that show high concordance with previously known CNVs in this individual.
Michael Brudno is an Assistant Professor in the Department of Computer Science and the Banting and Best Department of Medical Research, at the University of Toronto, and an Adjunct Scientist at the Toronto Hospital for Sick Children. He received a BA in Computer Science and History from UC Berkeley, and MSc and PhD from the Computer Science Department of Stanford University. He also completed a postdoctoral fellowship at UC Berkeley and was a Visiting Scientist at MIT before starting his position at Toronto. Michael Brudno's main research interest is the development of computational methods for the analysis of genomic datasets, especially High Throughput Sequencing data, including methods for the discovery of structural and copy-number polymorphisms and other genomic variation. He has also worked on comparative genomics, molecular evolution, and cloud computing. He is the recipient of the Alfred P. Sloan Research Fellowship, Ontario Early Researcher Award, and a Canada Research Chair in Computational Biology.