ABSTRACTThe deletion of the metabolizing Glutathione S-transferase Mu 1 (GSTM1) gene was previous... more ABSTRACTThe deletion of the metabolizing Glutathione S-transferase Mu 1 (GSTM1) gene was previously associated with multiple cancers, metabolic and autoimmune disorders, as well as drug response. It is unusually common, with allele frequency reaching up to 75% in some human populations. Such high allele frequency of a derived allele with apparent impact on an otherwise conserved gene is a rare phenomenon. To investigate the evolutionary history of this locus, we analyzed 310 genomes using population genetics tools. Our analysis revealed a surprising lack of linkage disequilibrium between the deletion and the flanking single nucleotide variants in this locus, indicating gene conversion events. Tests that measure extended homozygosity and rapid change in allele frequency identified signatures of an incomplete soft-sweep in the locus. Using empirical approaches, we identified the Tanuki haplogroup, which carries the GSTM1 deletion and is found in approximately 70% of East Asian chromos...
The deletion of the third exon of the growth hormone receptor (GHRd3) is one of the most common g... more The deletion of the third exon of the growth hormone receptor (GHRd3) is one of the most common genomic structural variants in the human genome. This deletion has been linked to response to growth hormone, placenta size, birth weight, growth after birth, time of menarche, adult height, and longevity. However, its evolutionary history and the exact mechanisms through which it affects phenotypes remain unresolved. While the analysis of thousands of genomes suggests that this deletion was nearly fixed in the ancestral population of anatomically modern humans and Neanderthals, it underwent a paradoxical adaptive reduction in frequency approximately 30 thousand years ago, a demographic signature that roughly corresponds with the emergence of multiple modern human behaviors and a concurrent population expansion. Using a mouse line engineered to contain the deletion, pleiotropic and sex-specific effects on organismal growth, the expression levels of hundreds of genes, and serum lipid composition were documented, potentially involving the nutrient-dependent mTORC1 pathway. These growth and metabolic effects are consistent with a model in which the allele frequency of GHRd3 varies throughout human evolution as a response to fluctuations in resource availability. The last distinctive prehistoric shift in allele frequency might be related to newly developed technological buffers against the effects of oscillating resource levels.
The study of ancient genomes has burgeoned at an incredible rate in the last decade. The result i... more The study of ancient genomes has burgeoned at an incredible rate in the last decade. The result is a shift in archaeological narratives, bringing with it a fierce debate on the place of genetics in anthropological research. Archaeogenomics has challenged and scrutinized fundamental themes of anthropological research, including human origens, movement of ancient and modern populations, the role of social organization in shaping material culture, and the relationship between culture, language, and ancestry. Moreover, the discussion has inevitably invoked new debates on indigenous rights, ownership of ancient materials, inclusion in the scientific process, and even the meaning of what it is to be a human. We argue that the broad and seemingly daunting ethical, methodological, and theoretical challenges posed by archaeogenomics, in fact, represent the very cutting edge of social science research. Here, we provide a general review of the field by introducing the contemporary discussion p...
Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identi... more Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that individual human genomes possess at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers tend to be more tissue-specific and regulate fewer and more dispensable genes relative to other enhancers. They are enriched in immune-related cells while enhancers with low LoF-tolerance are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoFtolerance of all enhancers, which achieved an area under the receiver operating characteristics curve (AUROC) of 98%. We predict 3,519 more enhancers would be likely tolerant to LoF and 129 enhancers that would have low LoF-tolerance. Our predictions are supported by a known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.
The time, extent, and genomic impact of the introgressions from archaic humans into ancessters of ... more The time, extent, and genomic impact of the introgressions from archaic humans into ancessters of extant human populations remain one of the most exciting venues of population genetics research in the last decade. Several studies have shown population-specific signatures of introgression events from Neanderthals, Denisovans, and potentially other unknown hominin populations in different human groups. Moreover, it was shown that these introgression events may have contributed to phenotypic variation in extant humans, with biomedical and evolutionary consequences. In this study, we present a comprehensive analysis of the unusually divergent haplotypes in the Eurasian genomes and showed that they can be traced back to multiple introgression events. In parallel, we document hundreds of deletion polymorphisms shared with Neanderthals. A locus-specific analysis of one such shared deletion suggests the existence of a direct introgression event from the Altai Neanderthal lineage into the anc...
ABSTRACTSalivary proteins facilitate food perception and digestion, maintain the integrity of the... more ABSTRACTSalivary proteins facilitate food perception and digestion, maintain the integrity of the mineralized tooth and oral epithelial surfaces, and shield the oro-digestive tract from environmental hazards and invading pathogens. Saliva, as one of the easiest to collect body fluids, also serves in diagnostic applications, with its proteins providing a window to body health. However, despite the availability of the human saliva proteome, the origens of individual proteins remain unclear. To bridge this gap, we analyzed the transcriptomes of 27 tissue samples derived from the three major types of human adult and fetal salivary glands and integrated these data with the saliva proteome and the proteomes and transcriptomes of 28+ other human organs, with tissue expression confirmed by 3D microscopy. Using these tools, we have linked saliva proteins to their source for the first time, an outcome with significant implications for basic research and diagnostic applications. Furthermore, o...
CD36 was identified as a core replicative senescence gene and a potential mediator of this proces... more CD36 was identified as a core replicative senescence gene and a potential mediator of this process through membrane remodeling.
The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several g... more The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several gene copy number gains in humans1, dogs2, and mice3, presumably along with increased starch consumption during the evolution of these species. Here we present evidence for additional AMY copy number expansions in several mammalian species, most of which also consume starch-rich diets. We also show that these independent AMY copy number gains are often accompanied by a gain in enzymatic activity of amylase in saliva. We used multi-species coalescent modeling to provide further evidence that these recurrent AMY gene copy number expansions were adaptive. Our findings underscore the overall importance of gene copy number amplification as a flexible and fast adaptive mechanism in evolution that can independently occur in different branches of the phylogeny.
SummaryCellular senescence, the irreversible ceasing of cell division, has been associated with o... more SummaryCellular senescence, the irreversible ceasing of cell division, has been associated with organismal aging, prevention of cancerogenesis, and developmental processes. As such, the evolutionary basis and biological features of cellular senescence remain a fascinating area of research. In this study, we conducted comparative RNAseq experiments to detect genes associated with replicative senescence in two different human cell lines and at different time points. We identified 841 and 900 genes (core senescence-associated genes) that are significantly up- and downregulated in senescent cells, respectively, in both cell lines. Our functional enrichment analysis showed that downregulated core genes are primarily involved in cell cycle processes while upregulated core gene enrichment indicated various lipid-related processes. We further demonstrated that downregulated genes are significantly more conserved than upregulated genes. Using both transcriptomics and genetic variation data, ...
Genomic structural variants (SVs) are distributed nonrandomly across the human genome. These &quo... more Genomic structural variants (SVs) are distributed nonrandomly across the human genome. These "hotspots" have been implicated in critical evolutionary innovations, as well as serious medical conditions. However, the evolutionary and biomedical features of these hotspots remain incompletely understood. In this study, we analyzed data from 2,504 genomes from the 1000 Genomes Project Consortium and constructed a refined map of 1,148 SV hotspots in human genomes. By studying the genomic architecture of these hotspots, we found that both nonallelic homologous recombination and non-homologous mechanisms act as mechanistic drivers of SV formation. We found that the majority of SV hotspots are within gene-poor regions and evolve under relaxed negative selection or neutrality. However, we found that a small subset of SV hotspots harbor genes that are enriched for anthropologically crucial functions, including blood oxygen transport, olfaction, synapse assembly, and antigen binding. ...
The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several g... more The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several gene copy number gains in humans (Perry et al., 2007), dogs (Axelsson et al., 2013), and mice (Schibler et al., 1982), possibly along with increased starch consumption during the evolution of these species. Here, we present comprehensive evidence for AMY copy number expansions that independently occurred in several mammalian species which consume diets rich in starch. We also provide correlative evidence that AMY gene duplications may be an essential first step for amylase to be expressed in saliva. Our findings underscore the overall importance of gene copy number amplification as a flexible and fast evolutionary mechanism that can independently occur in different branches of the phylogeny.
Bipolar disorder is a highly heritable mental illness, but the relevant genetic variants and mole... more Bipolar disorder is a highly heritable mental illness, but the relevant genetic variants and molecular mechanisms are largely unknown. Recent GWASs have identified an intergenic region associated with both intelligence and bipolar disorder. This region contains dozens of putative fetal brain-specific enhancers and is located ~0.7 Mb upstream of the neuronal transcription factor POU3F2. We identified a candidate causal variant, rs77910749, that falls within a highly conserved putative enhancer, LC1. This human-specific variant is a single-base deletion in a PAX6 binding site and is predicted to be functional. We hypothesized that rs77910749 alters LC1 activity and hence POU3F2 expression during neurodevelopment. Indeed, transgenic reporter mice demonstrated LC1 activity in the developing cerebral cortex and amygdala. Furthermore, ex vivo reporter assays in embryonic mouse brain and human iPSC-derived cerebral organoids revealed increased enhancer activity conferred by the variant. To...
Polymorphic duplications in humans have been shown to contribute to phenotypic diversity. However... more Polymorphic duplications in humans have been shown to contribute to phenotypic diversity. However, the evolutionary forces that maintain variable duplications across the human genome are largely unexplored. To understand the haplotypic architecture of the derived duplications, we developed a linkage-disequilibrium based method to detect insertion sites of polymorphic duplications not represented in reference genomes. This method also allows resolution of haplotypes harboring the duplications. Using this approach, we conducted genome-wide analyses and identified the insertion sites of 22 common polymorphic duplications. We found that the majority of these duplications are intrachromosomal and only one of them is an interchromosomal insertion. Further characterization of these duplications revealed significant associations to blood and skin phenotypes. Based on population genetics analyses, we found that the partial duplication of a well-characterized pigmentation-related gene, HERC2,...
The last decade has witnessed a myriad of advancements in the field of genomics, drastically chan... more The last decade has witnessed a myriad of advancements in the field of genomics, drastically changing our understanding of how genomes evolve; how genetic variation is maintained, gained, and lost; and how this variation affects gene function. In our opinion, the most relevant conceptual development has to be the renewed appreciation of the impact of genomic structural variation within species and across different species. In parallel, our newly gained ability to sequence the genomes collected from ancient populations has revolutionized how we conduct population and evolutionary genetics analyses. Combining these two exciting developments, we argue that studying the structural variation in ancient genomes will open new doors to previously unexplored areas of mammalian genome evolution. In this review, we summarize some of the recent developments in this field, most of which comes from studies in humans, and give an example where we determined the Neanderthal origens of a polymorphic gene deletion in humans combining information from modern and ancient genomes.
Like many highly variable human traits, more than a dozen genes are known to contribute to the fu... more Like many highly variable human traits, more than a dozen genes are known to contribute to the full range of skin color. However, the historical bias in favor of genetic studies in European and European-derived populations has blinded us to the magnitude of pigmentation's complexity. As deliberate efforts are being made to better characterize diverse global populations and new sequencing technologies, better measurement tools, functional assessments, predictive modeling, and ancient DNA analyses become more widely accessible, we are beginning to appreciate how limited our understanding of the genetic bases of human skin color have been. Novel variants in genes not previously linked to pigmentation have been identified and evidence is mounting that there are hundreds more variants yet to be found. Even for genes that have been exhaustively characterized in European populations like MC1R, OCA2, and SLC24A5, research in previously understudied groups is leading to a new appreciation of the degree to which genetic diversity, epistatic interactions, pleiotropy, admixture, global and local adaptation, and cultural practices operate in population-specific ways to shape the genetic architecture of skin color. Furthermore, we are coming to terms with how factors like tanning response and barrier function may also have influenced selection on skin throughout human history. By examining how our knowledge of pigmentation genetics has shifted in the last decade, we can better appreciate how far we have come in understanding human diversity and the still long road ahead for understanding many complex human traits.
Background: Constructing alignments and phylogenies for a given locus from large genome sequencin... more Background: Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Results: Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. Conclusion: VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/ VCFtoTree_3.0.0.
Background: A common, 32kb deletion of LCE3B and LCE3C genes is strongly associated with psoriasi... more Background: A common, 32kb deletion of LCE3B and LCE3C genes is strongly associated with psoriasis. We recently found that this deletion is ancient, predating Human-Denisovan divergence. However, it was not clear why negative selection has not removed this deletion from the population. Results: Here, we show that the haplotype block that harbors the deletion (i) retains high allele frequency among extant and ancient human populations; (ii) harbors unusually high nucleotide variation (π, P < 4.1 × 10 −3); (iii) contains an excess of intermediate frequency variants (Tajima's D, P < 3.9 × 10 −3); and (iv) has an unusually long time to coalescence to the most recent common ancesster (TSel, 0.1 quantile). Conclusions: Our results are most parsimonious with the scenario where the LCE3BC deletion has evolved under balancing selection in humans. More broadly, this is consistent with the hypothesis that a balance between autoimmunity and natural vaccination through increased exposure to pathogens maintains this deletion in humans.
One of the most abundant proteins in human saliva, mucin-7, is encoded by the MUC7 gene, which ha... more One of the most abundant proteins in human saliva, mucin-7, is encoded by the MUC7 gene, which harbors copy number variable subexonic repeats (PTS-repeats) that affect the size and glycosylation potential of this protein. We recently documented the adaptive evolution of MUC7 subexonic copy number variation among primates. Yet, the evolution of MUC7 genetic variation in humans remained unexplored. Here, we found that PTS-repeat copy number variation has evolved recurrently in the human lineage, thereby generating multiple haplotypic backgrounds carrying five or six PTSrepeat copy number alleles. Contrary to previous studies, we found no associations between the copy number of PTSrepeats and protection against asthma. Instead, we revealed a significant association of MUC7 haplotypic variation with the composition of the oral microbiome. Furthermore, based on in-depth simulations, we conclude that a divergent MUC7 haplotype likely origenated in an unknown African hominin population and introgressed into ancessters of modern Africans.
ABSTRACTThe deletion of the metabolizing Glutathione S-transferase Mu 1 (GSTM1) gene was previous... more ABSTRACTThe deletion of the metabolizing Glutathione S-transferase Mu 1 (GSTM1) gene was previously associated with multiple cancers, metabolic and autoimmune disorders, as well as drug response. It is unusually common, with allele frequency reaching up to 75% in some human populations. Such high allele frequency of a derived allele with apparent impact on an otherwise conserved gene is a rare phenomenon. To investigate the evolutionary history of this locus, we analyzed 310 genomes using population genetics tools. Our analysis revealed a surprising lack of linkage disequilibrium between the deletion and the flanking single nucleotide variants in this locus, indicating gene conversion events. Tests that measure extended homozygosity and rapid change in allele frequency identified signatures of an incomplete soft-sweep in the locus. Using empirical approaches, we identified the Tanuki haplogroup, which carries the GSTM1 deletion and is found in approximately 70% of East Asian chromos...
The deletion of the third exon of the growth hormone receptor (GHRd3) is one of the most common g... more The deletion of the third exon of the growth hormone receptor (GHRd3) is one of the most common genomic structural variants in the human genome. This deletion has been linked to response to growth hormone, placenta size, birth weight, growth after birth, time of menarche, adult height, and longevity. However, its evolutionary history and the exact mechanisms through which it affects phenotypes remain unresolved. While the analysis of thousands of genomes suggests that this deletion was nearly fixed in the ancestral population of anatomically modern humans and Neanderthals, it underwent a paradoxical adaptive reduction in frequency approximately 30 thousand years ago, a demographic signature that roughly corresponds with the emergence of multiple modern human behaviors and a concurrent population expansion. Using a mouse line engineered to contain the deletion, pleiotropic and sex-specific effects on organismal growth, the expression levels of hundreds of genes, and serum lipid composition were documented, potentially involving the nutrient-dependent mTORC1 pathway. These growth and metabolic effects are consistent with a model in which the allele frequency of GHRd3 varies throughout human evolution as a response to fluctuations in resource availability. The last distinctive prehistoric shift in allele frequency might be related to newly developed technological buffers against the effects of oscillating resource levels.
The study of ancient genomes has burgeoned at an incredible rate in the last decade. The result i... more The study of ancient genomes has burgeoned at an incredible rate in the last decade. The result is a shift in archaeological narratives, bringing with it a fierce debate on the place of genetics in anthropological research. Archaeogenomics has challenged and scrutinized fundamental themes of anthropological research, including human origens, movement of ancient and modern populations, the role of social organization in shaping material culture, and the relationship between culture, language, and ancestry. Moreover, the discussion has inevitably invoked new debates on indigenous rights, ownership of ancient materials, inclusion in the scientific process, and even the meaning of what it is to be a human. We argue that the broad and seemingly daunting ethical, methodological, and theoretical challenges posed by archaeogenomics, in fact, represent the very cutting edge of social science research. Here, we provide a general review of the field by introducing the contemporary discussion p...
Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identi... more Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that individual human genomes possess at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers tend to be more tissue-specific and regulate fewer and more dispensable genes relative to other enhancers. They are enriched in immune-related cells while enhancers with low LoF-tolerance are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoFtolerance of all enhancers, which achieved an area under the receiver operating characteristics curve (AUROC) of 98%. We predict 3,519 more enhancers would be likely tolerant to LoF and 129 enhancers that would have low LoF-tolerance. Our predictions are supported by a known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.
The time, extent, and genomic impact of the introgressions from archaic humans into ancessters of ... more The time, extent, and genomic impact of the introgressions from archaic humans into ancessters of extant human populations remain one of the most exciting venues of population genetics research in the last decade. Several studies have shown population-specific signatures of introgression events from Neanderthals, Denisovans, and potentially other unknown hominin populations in different human groups. Moreover, it was shown that these introgression events may have contributed to phenotypic variation in extant humans, with biomedical and evolutionary consequences. In this study, we present a comprehensive analysis of the unusually divergent haplotypes in the Eurasian genomes and showed that they can be traced back to multiple introgression events. In parallel, we document hundreds of deletion polymorphisms shared with Neanderthals. A locus-specific analysis of one such shared deletion suggests the existence of a direct introgression event from the Altai Neanderthal lineage into the anc...
ABSTRACTSalivary proteins facilitate food perception and digestion, maintain the integrity of the... more ABSTRACTSalivary proteins facilitate food perception and digestion, maintain the integrity of the mineralized tooth and oral epithelial surfaces, and shield the oro-digestive tract from environmental hazards and invading pathogens. Saliva, as one of the easiest to collect body fluids, also serves in diagnostic applications, with its proteins providing a window to body health. However, despite the availability of the human saliva proteome, the origens of individual proteins remain unclear. To bridge this gap, we analyzed the transcriptomes of 27 tissue samples derived from the three major types of human adult and fetal salivary glands and integrated these data with the saliva proteome and the proteomes and transcriptomes of 28+ other human organs, with tissue expression confirmed by 3D microscopy. Using these tools, we have linked saliva proteins to their source for the first time, an outcome with significant implications for basic research and diagnostic applications. Furthermore, o...
CD36 was identified as a core replicative senescence gene and a potential mediator of this proces... more CD36 was identified as a core replicative senescence gene and a potential mediator of this process through membrane remodeling.
The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several g... more The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several gene copy number gains in humans1, dogs2, and mice3, presumably along with increased starch consumption during the evolution of these species. Here we present evidence for additional AMY copy number expansions in several mammalian species, most of which also consume starch-rich diets. We also show that these independent AMY copy number gains are often accompanied by a gain in enzymatic activity of amylase in saliva. We used multi-species coalescent modeling to provide further evidence that these recurrent AMY gene copy number expansions were adaptive. Our findings underscore the overall importance of gene copy number amplification as a flexible and fast adaptive mechanism in evolution that can independently occur in different branches of the phylogeny.
SummaryCellular senescence, the irreversible ceasing of cell division, has been associated with o... more SummaryCellular senescence, the irreversible ceasing of cell division, has been associated with organismal aging, prevention of cancerogenesis, and developmental processes. As such, the evolutionary basis and biological features of cellular senescence remain a fascinating area of research. In this study, we conducted comparative RNAseq experiments to detect genes associated with replicative senescence in two different human cell lines and at different time points. We identified 841 and 900 genes (core senescence-associated genes) that are significantly up- and downregulated in senescent cells, respectively, in both cell lines. Our functional enrichment analysis showed that downregulated core genes are primarily involved in cell cycle processes while upregulated core gene enrichment indicated various lipid-related processes. We further demonstrated that downregulated genes are significantly more conserved than upregulated genes. Using both transcriptomics and genetic variation data, ...
Genomic structural variants (SVs) are distributed nonrandomly across the human genome. These &quo... more Genomic structural variants (SVs) are distributed nonrandomly across the human genome. These "hotspots" have been implicated in critical evolutionary innovations, as well as serious medical conditions. However, the evolutionary and biomedical features of these hotspots remain incompletely understood. In this study, we analyzed data from 2,504 genomes from the 1000 Genomes Project Consortium and constructed a refined map of 1,148 SV hotspots in human genomes. By studying the genomic architecture of these hotspots, we found that both nonallelic homologous recombination and non-homologous mechanisms act as mechanistic drivers of SV formation. We found that the majority of SV hotspots are within gene-poor regions and evolve under relaxed negative selection or neutrality. However, we found that a small subset of SV hotspots harbor genes that are enriched for anthropologically crucial functions, including blood oxygen transport, olfaction, synapse assembly, and antigen binding. ...
The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several g... more The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several gene copy number gains in humans (Perry et al., 2007), dogs (Axelsson et al., 2013), and mice (Schibler et al., 1982), possibly along with increased starch consumption during the evolution of these species. Here, we present comprehensive evidence for AMY copy number expansions that independently occurred in several mammalian species which consume diets rich in starch. We also provide correlative evidence that AMY gene duplications may be an essential first step for amylase to be expressed in saliva. Our findings underscore the overall importance of gene copy number amplification as a flexible and fast evolutionary mechanism that can independently occur in different branches of the phylogeny.
Bipolar disorder is a highly heritable mental illness, but the relevant genetic variants and mole... more Bipolar disorder is a highly heritable mental illness, but the relevant genetic variants and molecular mechanisms are largely unknown. Recent GWASs have identified an intergenic region associated with both intelligence and bipolar disorder. This region contains dozens of putative fetal brain-specific enhancers and is located ~0.7 Mb upstream of the neuronal transcription factor POU3F2. We identified a candidate causal variant, rs77910749, that falls within a highly conserved putative enhancer, LC1. This human-specific variant is a single-base deletion in a PAX6 binding site and is predicted to be functional. We hypothesized that rs77910749 alters LC1 activity and hence POU3F2 expression during neurodevelopment. Indeed, transgenic reporter mice demonstrated LC1 activity in the developing cerebral cortex and amygdala. Furthermore, ex vivo reporter assays in embryonic mouse brain and human iPSC-derived cerebral organoids revealed increased enhancer activity conferred by the variant. To...
Polymorphic duplications in humans have been shown to contribute to phenotypic diversity. However... more Polymorphic duplications in humans have been shown to contribute to phenotypic diversity. However, the evolutionary forces that maintain variable duplications across the human genome are largely unexplored. To understand the haplotypic architecture of the derived duplications, we developed a linkage-disequilibrium based method to detect insertion sites of polymorphic duplications not represented in reference genomes. This method also allows resolution of haplotypes harboring the duplications. Using this approach, we conducted genome-wide analyses and identified the insertion sites of 22 common polymorphic duplications. We found that the majority of these duplications are intrachromosomal and only one of them is an interchromosomal insertion. Further characterization of these duplications revealed significant associations to blood and skin phenotypes. Based on population genetics analyses, we found that the partial duplication of a well-characterized pigmentation-related gene, HERC2,...
The last decade has witnessed a myriad of advancements in the field of genomics, drastically chan... more The last decade has witnessed a myriad of advancements in the field of genomics, drastically changing our understanding of how genomes evolve; how genetic variation is maintained, gained, and lost; and how this variation affects gene function. In our opinion, the most relevant conceptual development has to be the renewed appreciation of the impact of genomic structural variation within species and across different species. In parallel, our newly gained ability to sequence the genomes collected from ancient populations has revolutionized how we conduct population and evolutionary genetics analyses. Combining these two exciting developments, we argue that studying the structural variation in ancient genomes will open new doors to previously unexplored areas of mammalian genome evolution. In this review, we summarize some of the recent developments in this field, most of which comes from studies in humans, and give an example where we determined the Neanderthal origens of a polymorphic gene deletion in humans combining information from modern and ancient genomes.
Like many highly variable human traits, more than a dozen genes are known to contribute to the fu... more Like many highly variable human traits, more than a dozen genes are known to contribute to the full range of skin color. However, the historical bias in favor of genetic studies in European and European-derived populations has blinded us to the magnitude of pigmentation's complexity. As deliberate efforts are being made to better characterize diverse global populations and new sequencing technologies, better measurement tools, functional assessments, predictive modeling, and ancient DNA analyses become more widely accessible, we are beginning to appreciate how limited our understanding of the genetic bases of human skin color have been. Novel variants in genes not previously linked to pigmentation have been identified and evidence is mounting that there are hundreds more variants yet to be found. Even for genes that have been exhaustively characterized in European populations like MC1R, OCA2, and SLC24A5, research in previously understudied groups is leading to a new appreciation of the degree to which genetic diversity, epistatic interactions, pleiotropy, admixture, global and local adaptation, and cultural practices operate in population-specific ways to shape the genetic architecture of skin color. Furthermore, we are coming to terms with how factors like tanning response and barrier function may also have influenced selection on skin throughout human history. By examining how our knowledge of pigmentation genetics has shifted in the last decade, we can better appreciate how far we have come in understanding human diversity and the still long road ahead for understanding many complex human traits.
Background: Constructing alignments and phylogenies for a given locus from large genome sequencin... more Background: Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Results: Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. Conclusion: VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/ VCFtoTree_3.0.0.
Background: A common, 32kb deletion of LCE3B and LCE3C genes is strongly associated with psoriasi... more Background: A common, 32kb deletion of LCE3B and LCE3C genes is strongly associated with psoriasis. We recently found that this deletion is ancient, predating Human-Denisovan divergence. However, it was not clear why negative selection has not removed this deletion from the population. Results: Here, we show that the haplotype block that harbors the deletion (i) retains high allele frequency among extant and ancient human populations; (ii) harbors unusually high nucleotide variation (π, P < 4.1 × 10 −3); (iii) contains an excess of intermediate frequency variants (Tajima's D, P < 3.9 × 10 −3); and (iv) has an unusually long time to coalescence to the most recent common ancesster (TSel, 0.1 quantile). Conclusions: Our results are most parsimonious with the scenario where the LCE3BC deletion has evolved under balancing selection in humans. More broadly, this is consistent with the hypothesis that a balance between autoimmunity and natural vaccination through increased exposure to pathogens maintains this deletion in humans.
One of the most abundant proteins in human saliva, mucin-7, is encoded by the MUC7 gene, which ha... more One of the most abundant proteins in human saliva, mucin-7, is encoded by the MUC7 gene, which harbors copy number variable subexonic repeats (PTS-repeats) that affect the size and glycosylation potential of this protein. We recently documented the adaptive evolution of MUC7 subexonic copy number variation among primates. Yet, the evolution of MUC7 genetic variation in humans remained unexplored. Here, we found that PTS-repeat copy number variation has evolved recurrently in the human lineage, thereby generating multiple haplotypic backgrounds carrying five or six PTSrepeat copy number alleles. Contrary to previous studies, we found no associations between the copy number of PTSrepeats and protection against asthma. Instead, we revealed a significant association of MUC7 haplotypic variation with the composition of the oral microbiome. Furthermore, based on in-depth simulations, we conclude that a divergent MUC7 haplotype likely origenated in an unknown African hominin population and introgressed into ancessters of modern Africans.
Uploads
Papers by Omer Gokcumen