Home / Papers / Application of next generation sequencing in genetic and genomic studies

Application of next generation sequencing in genetic and genomic studies

88 Citations2016
Jingwen Wang
journal unavailable

The whole series of studies demonstrate how NGS can be applied in studying the genetic basis of complex disorders and assisting in follow-up functional studies in model organisms.

Abstract

Genetic variants that spread along the human genome play vital roles in determining our traits, affecting development and potentially causing disorders. Most common disorders have complex underlying mechanisms involving genetic or environmental factors and the interaction between them. Over the past decade, genome-wide association studies (GWAS) have identified thousands of common variants that contribute to complex disorders and partially explain the heritability. However, there is still a large portion that is unexplained and the missing heritability may be caused by several factors, such as rare or low-frequency variants with high effect that are not covered by GWAS and linkage analysis. With the development of next generation sequencing (NGS), it is possible to rapidly detect large amount of novel rare and low-frequency variants simultaneously at a low cost. This new technology provides vast information on studying the association of genetic variations and complex disorders. Once the susceptibility gene is mapped, model organisms such as zebrafish (Danio rerio) are popular for further investigating the possible function of diseaseassociated gene in determining the phenotype. However, the genome annotation of zebrafish is not complete, which affects the characterization of gene functions. Accordingly, highthroughput RNA sequencing can be employed for identifying new transcripts. In our studies, pooled DNA samples were used for whole genome sequencing (WGS) and exome sequencing. In Paper I, we evaluated minor allele frequency (MAF) estimates using three variant detection tools with two sets of pooled exome sequencing and one set of pooled WGS data. The MAFs from the pooled sequencing data demonstrated high concordance (r = 0.88-0.94) with those from the individual genotyping data. In Paper II, exome sequencing implementing pooling strategy was performed on 100 idiopathic scoliosis (IS) patients for mapping susceptibility genes. After validating 20 candidate single nucleotide variants (SNVs), we did not find associations between them and IS. However, the previously reported common variant rs11190870 near LBX1 was validated in a large Scandinavian cohort. In Paper III, we analyzed WGS of pooled DNA samples performed on 19 affected individuals who shared a phenotype-linked haplotype in a dyslexic Finish family. Two of the individuals were sequenced for the whole genome individually as well. The screen for causative variants was narrowed down to a rare SNV, which might affect the binding affinity of LHX2 that regulated dyslexia associated gene ROBO1. In Paper IV, RNA sequencing (RNA-seq) data were analyzed for identifying novel transcripts in zebrafish early development using an inhouse pipeline. We discovered 152 novel transcribed regions (NTRs), validated more than 10 NTRs and quantified their expression in early developmental stages. In our studies, we evaluated and applied a pooling approach for identifying variants susceptible to disease using high-throughput DNA sequencing. Based on RNA sequencing data, we provided new information for genome annotation on model organism zebrafish, which is valuable for studying the function of disease causative genes. In summary, the whole series of studies demonstrate how NGS can be applied in studying the genetic basis of complex disorders and assisting in follow-up functional studies in model organisms. LIST OF SCIENTIFIC PAPERS I. Wang J, Skoog T, Einarsdottir E, Kaartokallio T, Laivuori H, Grauers A, Gerdhem P, Hytönen M, Lohi H, Kere J, Jiao H. Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples. Manuscript submitted to Sci Rep. and under review II. Grauers A, Wang J, Einarsdottir E, Simony A, Danielsson A, Åkesson K, Ohlin A, Halldin K, Grabowski P, Tenne M, Laivuori H, Dahlman I, Andersen M, Christensen SB, Karlsson MK, Jiao H, Kere J, Gerdhem P. Candidate gene analysis and exome sequencing confirm LBX1 as a susceptibility gene for idiopathic scoliosis. Spine J. 2015 Oct 1;15(10):2239-46. doi: 10.1016/j.spinee.2015.05.013. III. Massinen S, Wand J, Laivuori K, Bieder A, Paez IT, Jiao H, Kere J. Genomic sequencing of a dyslexia susceptibility haplotype encompassing ROBO1. J Neurodev Disord. 2016 Jan 27;8:4. doi: 10.1186/s11689-016-9136-y. IV. Wang J, Vesterlund L, Kere J, Jiao H. Identification of novel transcribed regions in zebrafish (Danio rerio) using RNA-sequencing Manuscript submitted to PLoS One and under review † Equal contribution to the work Other publications not involved in this thesis I. Smialowska A, Djupedal I, Wang J, Kylsten P, Swoboda P, Ekwall K. RNAi mediates post-transcriptional repression of gene expression in fission yeast Schizosaccharomyces pombe. Biochem Biophys Res Commun. 2014 Feb 7;444(2):254-9 II. Kaartokallio T, Wang J, Heinonen S, Kajantie E, Kivinen K, Pouta A, Gerdhem P, Jiao H, Kere J, Laivuori H. Exome sequencing in pooled DNA samples to identify maternal pre-eclampsia risk variants. Sci Rep. 2016. In press † Equal contribution to the work CONTENTS 1 Background ..................................................................................................................... 1 1.1 Genetic variations ................................................................................................. 1 1.1.1 Different types of genetic variations ........................................................ 1 1.1.2 Effects ....................................................................................................... 2 1.2 Disorders ............................................................................................................... 3 1.2.1 Monogenic disorders ................................................................................ 3 1.2.2 Complex disorders ................................................................................... 3 1.3 Gene mapping in disorders ................................................................................... 4 1.3.1 Linkage analysis ....................................................................................... 5 1.3.2 Association studies ................................................................................... 5 1.4 Missing heritability of complex disorders ............................................................ 6 1.5 Functional studies following identification of disease-causative genes .............. 7 1.5.1 In silico ..................................................................................................... 8 1.5.2 In vivo – model organisms ....................................................................... 8 2 Introduction ..................................................................................................................... 9 2.1 NGS platforms ...................................................................................................... 9 2.1.1 Illumina .................................................................................................. 10 2.1.2 SOLiD .................................................................................................... 11 2.1.3 Complete Genomics ............................................................................... 11 2.2 NGS application .................................................................................................. 11 2.2.1 Whole genome sequencing and exome sequencing .............................. 11 2.2.2 RNA sequencing .................................................................................... 14 2.3 Bioinformatic analysis of NGS data ................................................................... 15 3 Aims .............................................................................................................................. 16 4 Materials and methods .................................................................................................. 17 4.1 Study subjects and materials ............................................................................... 17 4.1.1 Idiopathic scoliosis (IS) case-control cohorts ........................................ 17 4.1.2 Preeclampsia (PE) case-control cohorts and families ........................... 17 4.1.3 Affected members of a dyslexia family ................................................. 18 4.1.4 A Bull Terrier tail-chasing case-control cohort ..................................... 18 4.1.5 Zebrafish embryos ................................................................................. 18 4.2 Next generation sequencing ................................................................................ 19 4.2.1 Pooling strategy ...................................................................................... 19 4.2.2 Library preparation and sequencing ...................................................... 20 4.3 Data analysis ....................................................................................................... 20 4.3.1 Alignment of sequencing reads ............................................................. 20 4.3.2 Detection of genetic variations .............................................................. 21 4.3.3 Evaluation of allele frequency estimates ............................................... 22 4.3.4 Annotation and filtering of variants ....................................................... 23 4.3.5 Association analysis ............................................................................... 24 4.3.6 Identification of NTRs ........................................................................... 24 4.3.7 Gene expression ..................................................................................... 25 4.4 Experimental validation ...................................................................................... 25 4.4.1 Genotyping ............................................................................................. 25 4.4.2 PCR .............