0% found this document useful (0 votes)
12 views16 pages

AGR322- genomics

Genomics is the comprehensive study of genomes, focusing on the interactions of genes and their influence on organism biology, while genome structure involves the hierarchical organization of DNA into chromatin and chromosomes. DNA sequencing, particularly next-generation sequencing, has revolutionized the ability to quickly and affordably read entire genomes, enabling the identification of genetic variations. The transcriptome and proteome represent the total RNA and protein expressions, respectively, and their analysis through transcriptomics and proteomics provides insights into gene function and cellular processes.

Uploaded by

jig46658
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

AGR322- genomics

Genomics is the comprehensive study of genomes, focusing on the interactions of genes and their influence on organism biology, while genome structure involves the hierarchical organization of DNA into chromatin and chromosomes. DNA sequencing, particularly next-generation sequencing, has revolutionized the ability to quickly and affordably read entire genomes, enabling the identification of genetic variations. The transcriptome and proteome represent the total RNA and protein expressions, respectively, and their analysis through transcriptomics and proteomics provides insights into gene function and cellular processes.

Uploaded by

jig46658
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

1.

Definition of genomics

Genomics is the study of genomes, i.e., to study the complete set of genes of an organism,
including interactions of those genes with each other and with the organism’s environment.
Compared to genetics, which is the study of a single gene and how it influences inheritance,
genomics aims at the collective characterization and quantification of all of an organism's genes,
their interrelations, and their influence on organism biology.

Genetics studies how genes control the synthesis of RNAs and proteins; proteins make up body
structures such as organs and tissues, control chemical reactions and carry signals between cells.
Also, genomics encompasses the sequencing and analysis of genomes through high throughput
DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire
genomes. Genomics can identify variations within the genome, the effect of a gene/s on another
or a trait, and multiple characteristics. Genomics can be used to recognize other biological
phenomena such as heterosis, epistasis, and pleiotropy effect of loci and alleles within the
genome in relation to phenotypic expression.

2. Genome structure and organization

The genome is a complex hierarchical order of structures by folding the DNA into chromatin
fibers, chromosome domains, and, ultimately, chromosomes. The spatial organization of
genomes plays a vital role in their function, including gene regulation and control of gene
expression programs. Chromatin loops and topological domains form the basic structural units of
this multiscale organization and are essential to orchestrate complex regulatory networks and
transcription mechanisms. They also form higher-order structures such as chromosomal
compartments and chromosome territories. Each level of this intrinsic architecture is governed
by principles and mechanisms that we are only beginning to understand. Over the past decade,
scientists have endeavored to elucidate the spatial characteristics and functions of plant and
animal genome architecture using high-throughput chromosome conformation capture (3C)
technology, such as high-through chromosome conformation capture (Hi-C), chromatin
interaction analysis by paired-end tag sequencing (ChIA-PET), Hi-C and chromatin
immunoprecipitation (HiChIP), and in situ Hi-C have been developed to capture long-range
DNA–DNA interactions in mammals.
2.1. Genome organization.

The 3C-based technologies combined with microscopic imaging methods have elucidated
hierarchical three-dimensional (3D) chromatin organization on multiple scales, including
chromosome territories (CTs), A/B compartments, topologically associating domains (TADs),
and chromatin loops, Enhancer-promoter, Promoter Networks, Gene Loops, Polyocomb
interactions.

Hierarchical genome organization. Hi-C heatmaps for different scales: whole genome (a), whole
chromosome (b), megabase (c, d), and hundred kilobases (e), and a model of genome folding at
these scales (f–h) is shown. Whole-genome contact maps show that chromosomes occupy
separate chromosomal territories and rarely interact with each other (a, f). Megabase-level
heatmaps with clear square formations along the diagonal indicate topological domains (c, d, g).
Plaid-like pattern corresponding to compartments A and B is also visible (b, c, g). The high-
resolution heat maps show individual peaks corresponding to chromatin loops (e, h). Heatmaps
were created from the GM12787 in situ Hi-C dataset published by Rao et al. (2014) using
Juicebox (Durand et al. 2016). Stop here for undergraduates.

3. What is DNA sequencing

One of the most exciting developments in genomics has been the development of next-
generation sequencing technology. This technology allows us to ‘read’ an organism's whole
genome relatively quickly and cheaply. Sequencing the first human genome took more than ten
years and millions of pounds. The first Arabidopsis genome sequencing also took a long time,
and the same is for tomatoes and the early genome sequencing projects. However, today, it can
be done in around 24 hours for a few thousand pounds.

DNA sequencing is the process of determining the correct order of the bases in a strand of DNA.
Since bases exist as pairs, and the identity of one of the bases in the pair determines the other
pair, researchers do not have to report both bases of the pair.

The most common type of sequencing used today, called sequencing by synthesis, is DNA
polymerase (the enzyme in cells that synthesizes DNA) is used to generate a new strand of DNA
from a strand of interest. In the sequencing reaction, the enzyme incorporates individual
nucleotides into the new DNA strand that has been chemically tagged with a fluorescent label.
As this happens, the nucleotide is excited by a light source, and a fluorescent signal is emitted
and detected. The signal is different depending on which of the four nucleotides was
incorporated. This method can generate 'reads' of 125 nucleotides in a row and billions of reads
simultaneously. See attached reference paper. Please read it and other related papers.

The sequence of overlapping segments needs to be read to assemble the sequence of all the bases
in a large piece of DNA, such as a gene. This allows the more extended sequence to be
assembled from shorter pieces, like putting together a linear jigsaw puzzle. In this process, each
base has to be read not just once but at least several times in the overlapping segments to ensure
accuracy. See the attached videos.

One of the applications of DNA sequencing is that it can help us search for genetic variations
and mutations that may play a role in the development or progression of a disease. The disease-
causing change may be as small as the substitution, deletion, or addition of a single base pair or
as large as the deletion of thousands of bases.
References and further reading

1. Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al. Juicebox
provides a visualization system for hi-C contact maps with unlimited zoom. Cell Syst.
2016;3(1): 99–101.
2. Przemyslaw Szalaj, Dariusz Plewczynski. Three-dimensional organization and dynamics
of the genome. Cell Biol Toxicol (2018) 34:381–404. https://doi.org/10.1007/s10565-
018-9428-y.
3. Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A
3D map of the human genome at Kilobase resolution reveals principles of chromatin
looping. Cell. 2014;159(7):1665–80.
4. Segerman B The Most Frequently Used Sequencing Technologies and Assembly
Methods in Different Time Segments of the Bacterial Surveillance and RefSeq Genome
Databases. Front. Cell. Infect. Microbiol. 2020; 10:527102. doi:
10.3389/fcimb.2020.527102
5. Slatko B.E., Gardner A.F., Ausubel F.M. Overview of Next-Generation Sequencing
Technologies. Curr. Protoc. Mol. Biol. 2018;122:e59. doi: 10.1002/cpmb.59.
6. Weizhi Ouyang, Dan Xiong, Guoliang Li, Xingwang Li. Unraveling the 3D Genome
Architecture in Plants: Present and Future. Review Mol Plant. 2020 Dec 7;13(12):1676-
1693. doi: 10.1016/j.molp.2020.10.002.
1. Transcriptomics
1.1 What is the transcriptome

The transcriptome is the set of all RNA transcripts or the total RNAs, which includes coding and
non-coding in an organism or a population of cells. This term can also be used to refer to all
RNAs, or just mRNA, depending on the particular experiment that is being carried out. The
study of an organism's total population of RNA is called transcriptomics.

Although the transcriptome makes up less than 4% of the total cell RNA, it is the most
significant component because it contains the coding RNAs that express the genome. It is
important to note that the transcriptome is not synthesized de novo. Every cell receives part of its
parent’s transcriptome when it is first brought into existence by cell division and maintains a
transcriptome throughout its lifetime. Even quiescent cells in bacterial spores or the seeds of
plants have a transcriptome, although the translation of that transcriptome into protein may be
completely switched off. Transcription of individual protein-coding genes does not, therefore,
result in the synthesis of the transcriptome but instead maintains the transcriptome by replacing
mRNAs that have been degraded and bringing about changes to the composition of the
transcriptome via the switching on and off of different sets of genes.

Even in the simplest organisms, such as bacteria and yeast, many genes are active at any time.
Transcriptomes are, therefore, complex, containing copies of hundreds, if not thousands, of
different mRNAs. Usually, each mRNA makes up only a small fraction of the whole, with the
most common type rarely contributing more than 1% of the total mRNA. Exceptions are cells
with highly specialized biochemistries, reflected by transcriptomes in which one or a few
mRNAs predominate. Developing wheat seeds are an example: these synthesize large amounts
of the gliadin proteins, which accumulate in the dormant grain and provide a source of amino
acids for the germinating seedling. Within the developing seeds, the gliadin mRNAs can make
up as much as 30% of the transcriptomes of specific cells.
One of the usefulness of transcriptomics is that it allows for identifying genes and pathways that
respond to biotic and abiotic environmental stresses. The non-targeted nature of transcriptomics
allows for identifying novel transcriptional networks in complex organisms, cells, tissues, or
organs.

RNA sequencing technologies

Students should read about RNA sequencing techniques that the scientific community has used,
used, and are currently developing. Students must understand the history and evolution of early
technology and contemporary techniques for capturing genome expression at the transcript level.

I have included some links to help students navigate this topic.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5436640/

https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/
transcriptomics

https://link.springer.com/chapter/10.1007/10_2017_52

https://www.mdpi.com/2223-7747/11/5/675

Single-cell RNA-seq technology discussion will be for post-graduates. Separate lecture material
comprising selected papers and slide illustrations.
2. Proteomics
2.1. Proteins and quantitative proteome.

The second product of genome expression is the proteome, the cell’s repertoire of proteins,
which specifies the nature of the biochemical reactions the cell can carry out. The proteins that
make up the proteome are synthesized by translating the individual RNA molecules in the
transcriptome.

Students are expected to define proteomics and how they understand it.

Quantitative proteomics is a rapidly evolving research field both concerning mass spectrometers
and because of improved sample processing, including new labeling strategies or protein
isolation procedures such as filter-aided sample preparation. Quantitative proteomics has been
used to investigate the proteome of disease states in plants and animals, growth and
development, and trait of interest in different backgrounds. Human studies have become a
significant subject of studies to apply proteomics for biomarker and therapeutic target
identification. Quantitative plant proteomics, in particular, poses many additional challenges, but
because of the nature of plants, it also offers some potential advantages. In general, the analysis
of plants has been less prominent in proteomics. Low protein concentration, difficulties in
protein extraction, genome multiploidy, high Rubisco abundance in green tissue, and an absence
of well-annotated and completed genome sequences are some of the main challenges in plant
proteomics. However, the latter is now changing, with several genomes emerging for model
plants and crops such as potatoes, tomatoes, soybean, rice, maize, and barley. In the last decades,
technical advances in mass spectrometry have increased the capacity of protein identification and
quantification.

Moreover, posttranslational modification (PTM) analysis, especially phosphorylation, has


allowed large-scale identification of biological mechanisms. Even so, increasing evidence
indicates that global protein quantification is often insufficient for explaining biology and has
posed challenges in identifying new and robust biomarkers. As a consequence, to improve the
accuracy of the discoveries made using proteomics, it is necessary to combine (i) robust and
reproducible methods for sample preparation allowing statistical comparison, (ii) PTM analyses
in addition to global proteomics for additional levels of knowledge, and (iii) use of
bioinformatics for decrypting protein list.

Genome sequencing and annotation, databases, databanks, and genome browsers

Genome sequencing

The final objective of a genome project is the complete DNA sequence for the organism being
studied, ideally integrated with the genome's genetic and physical maps so that genes and other
interesting features can be located within the DNA sequence. Techniques for DNA sequencing
are clearly of central importance in this context, and we will pay particular attention to
examining DNA sequencing methodologies from the first generation to the current state of the
art.

The Methodology for DNA Sequencing

There are several procedures for DNA sequencing, but by far, the most popular is the chain
termination method, first devised by Fred Sanger and colleagues in the mid-1970s. Chain
termination sequencing has gained preeminence for several reasons, not least being the relative
ease with which the technique can be automated. As we will see later in this chapter, a genome
project involves many individual sequencing experiments, and it would take many years to
perform all these by hand. Automated sequencing techniques are essential if a project is to be
completed reasonably.

Chain termination DNA sequencing

Chain termination DNA sequencing is based on the principle that single-stranded DNA
molecules that differ in length by just a single nucleotide can be separated by polyacrylamide gel
electrophoresis. This means that it is possible to resolve a family of molecules, representing all
lengths from 10 to 1500 nucleotides, into a series of bands in a slab or capillary gel.
The starting material for a chain termination sequencing experiment is a preparation of identical,
single-stranded DNA molecules. The first step is to anneal a short oligonucleotide to the same
position on each molecule; this oligonucleotide subsequently acts as the primer for synthesizing
a new DNA strand complementary to the template. The strand synthesis reaction, catalyzed by a
DNA polymerase enzyme (see below), requires the four deoxyribonucleotide triphosphates
(dNTPs—dATP, dCTP, dGTP, and dTTP) as substrates, would typically continue until several
thousand nucleotides had been polymerized. This does not occur in a chain termination
sequencing experiment because, as well as the four deoxynucleotides, a small amount of each of
the four dideoxynucleotide triphosphates (ddNTPs—ddATP, ddCTP, ddGTP, and ddTTP) is
added to the reaction. Each of these dideoxynucleotides is labeled with a different fluorescent
marker.
Chain termination DNA sequencing. (A) Chain termination sequencing involves the synthesis of new
strands of DNA that are complementary to a single-stranded template. (B) Strand synthesis does not
proceed indefinitely because the reaction mixture contains small amounts of each of the four
dideoxynucleotides, which block further elongation because they have a hydrogen atom rather than a
hydroxyl group attached to the 3’-carbon. (C) Incorporation of ddATP results in chains that are
terminated opposite Ts in the template. This generates the “A” family of terminated molecules.
Incorporation of the other dideoxynucleotides generates the “C,” “G,” and “T” families.

The polymerase enzyme does not discriminate between deoxy- and dideoxynucleotides, but once
incorporated, a dideoxynucleotide blocks further strand elongation because it lacks the 3¢–
hydroxyl group needed to form a connection with the next nucleotide. Because the normal
deoxynucleotides are also present in larger amounts than the dideoxynucleotides, the strand
synthesis does not always terminate close to the primer; several hundred nucleotides may be
polymerized before a dideoxynucleotide is eventually incorporated. The result is a set of new
molecules, all of different lengths, and each end in a dideoxynucleotide whose identity indicates
the nucleotide—A, C, G, or T—present at the equivalent position in the template DNA.

To determine the DNA sequence, all that we have to do is identify the dideoxynucleotide at the
end of each chain-terminated molecule. This is where the polyacrylamide gel comes into play.
The DNA mixture is loaded into a well of a polyacrylamide slab gel or a tube of a capillary gel
system, and electrophoresis is carried out to separate the molecules according to their lengths.
After separation, the molecules are run past a fluorescence detector capable of discriminating the
labels attached to the dideoxynucleotides. The detector, therefore, determines if each molecule
ends in an A, C, G, or T. The sequence can be printed out for examination by the operator or
entered directly into a storage device for future analysis. Automated sequencers with multiple
capillaries working in parallel can read up to 96 different sequences in a two-hour period, which
means that with an average of 750 bp per individual experiment, 864 kb of information is
generated per machine per day.
Reading the sequence generated by a chain termination experiment. (A) Each dideoxynucleotide is
labeled with a different fluorophore. During electrophoresis, the labeled molecules move past a
fluorescence detector, which identifies which dideoxynucleotide is present in each band. The information
is passed to the imaging system. (B) A DNA sequencing printout. The sequence is represented by a series
of peaks, one for each nucleotide position. In this example, a green peak is an “A,” blue is “C,” brown is
“G,” and red is “T.”

DNA polymerases for chain termination sequencing

Any template-dependent DNA polymerase is capable of extending a primer that has been
annealed to a single-stranded DNA molecule, but not all polymerases do this in a way that is
useful for DNA sequencing. A sequencing enzyme must fulfill three criteria in particular:

 High processivity. This refers to the length of the polynucleotide that is synthesized
before the polymerase terminates through natural causes. A sequencing polymerase must
have high processivity so that it does not dissociate from the template before
incorporating a dideoxynucleotide.
 Negligible or zero 5’ - 3’ exonuclease activity. Most DNA polymerases also have
exonuclease activities, meaning they can degrade and synthesize DNA polynucleotides.
This is a disadvantage in DNA sequencing because removing nucleotides from the 5’
ends of the newly synthesized strands alters the lengths of these strands, making it
impossible to determine the correct sequence.
 Negligible or zero 3’ -5’ exonuclease activity. This is also desirable so that the
polymerase does not remove the dideoxynucleotide at the end of a completed strand. If
this happens, then the strand might be further extended. The net result will be that there
are few short strands in the reaction mixture, and the sequence close to the primer will be
unreadable.

Any naturally occurring DNA polymerase does not entirely meet these stringent requirements.
Instead, artificially modified enzymes are generally used. The first of these to be developed was
the Klenow polymerase, a version of Escherichia coli DNA polymerase I from which the
standard enzyme's 5’ - 3’ exonuclease activity has been removed, either by cleaving away the
relevant part of the protein or by genetic engineering. The Klenow polymerase has relatively low
processivity, limiting the length of sequence obtained from a single experiment to about 250 bp,
and giving nonspecific product strands that have terminated naturally rather than by
incorporating a dideoxynucleotide in the sequencing reaction. The Klenow enzyme has therefore
been superseded by a modified version of the DNA polymerase encoded by bacteriophage T7,
this enzyme going under the tradename “Sequenase.” Sequenase has high processivity and no
exonuclease activity and possesses other desirable features, such as a rapid reaction rate.

The primer determines the region of the template DNA that will be sequenced

An oligonucleotide primer is annealed onto the template DNA to begin a chain termination
sequencing experiment. The primer is needed because template-dependent DNA polymerases
cannot initiate DNA synthesis on an entirely single-stranded molecule: there must be a short,
double-stranded region to provide a 3’ end onto which the enzyme can add new nucleotides.

The primer also plays the critical role of determining the region of the template molecule that
will be sequenced. For most sequencing experiments, a “universal” primer is used, one
complementary to the part of the vector DNA immediately adjacent to the point into which new
DNA is ligated. The same universal primer can therefore give the sequence of any DNA ligated
into the vector. Of course, if this inserted DNA is longer than 750 bp or so, then only a part of its
sequence will be obtained, but usually, this is not a problem because the project as a whole
requires that a large number of short sequences are generated and subsequently assembled into
the contiguous master sequence. It is immaterial whether or not the short sequences are the
complete or only partial sequences of the DNA fragments used as templates. If double-stranded
plasmid DNA is used to provide the template, then, if desired, more sequences can be obtained
from the other end of the insert. Alternatively, it is possible to extend the sequence in one
direction by synthesizing a nonuniversal internal primer designed to anneal at a position within
the insert DNA. An experiment with this primer will provide a second short sequence that
overlaps the previous one.

Different types of primer for chain termination sequencing. (A) A universal primer anneals to the vector
DNA adjacent to the position at which new DNA is inserted. A single universal primer can sequence any
DNA insert but only provides the sequence of one end of the insert. (B) One way of obtaining a longer
sequence is to carry out a series of chain termination experiments, each with a different internal primer
that anneals within the DNA insert.

Students should practice how to design primers with the NCBI primer tool

Thermal cycle sequencing offers an alternative to the traditional methodology

The discovery of thermostable DNA polymerases, which led to the development of PCR, has
also resulted in new methodologies for chain termination sequencing. In particular, the thermal
cycle sequencing innovation has two advantages over traditional chain termination sequencing.
First, it uses double-stranded rather than single-stranded DNA as the starting material. Second,
very little template DNA is needed, so the DNA does not have to be cloned before being
sequenced.

Thermal cycle sequencing is carried out similarly to PCR, but just one primer is used, and the
reaction mixture includes the four dideoxynucleotides. Because there is only one primer, only
one of the strands of the starting molecule is copied, and the product accumulates linearly, not
exponentially, as is the case in a real PCR. The presence of the dideoxynucleotides in the
reaction mixture causes chain termination, as in the standard methodology, and the family of
resulting strands can be analyzed, and the sequence read in the usual way.

Alternative methods for DNA sequencing


Although most sequencing is carried out by the chain termination method, other techniques
remain essential for specific applications. We will examine two of these alternative techniques:
the chemical degradation method, which, like chain termination sequencing, was devised in the
1970s, and pyrosequencing, which is a more recent invention.

Chemical degradation sequencing

One limitation of chain termination sequencing is that it may not be able to provide an accurate
sequence if the template DNA can form intrastrand base pairs. Intrastrand base pairs can block
the progress of the DNA polymerase, reducing the amount of strand synthesis that occurs. They
can also alter the mobility of the chain-terminated molecules during electrophoresis, meaning
that the order in which the molecules pass the detector is no longer determined solely by their
length. Intrastrand base pairs do not hinder chemical degradation sequencing, so this method can
be used as an alternative when such problems arise.

The chemical degradation method is similar to chain termination sequencing in that the sequence
is determined by examining the lengths of molecules whose terminal nucleotide is known.
However, these molecules are generated differently by treatment with chemicals cut precisely at
a particular nucleotide. This means that at least four separate sequencing reactions must be
carried out, one for each nucleotide.

The starting material is double-stranded DNA, first labeled by attaching a radioactive


phosphorus group to the 5’ end of each strand. Dimethylsulfoxide (DMSO) is then added, and
the DNA is heated to 90˚C. This breaks the base pairing between the strands, enabling them to be
separated from one another by gel electrophoresis; the basis is that one of the strands probably
contains more purine nucleotides than the other and is, therefore, slightly heavier and runs more
slowly during the electrophoresis.

One strand is purified from the gel and divided into four samples, each treated with one of the
cleavage reagents. To illustrate the procedure, we will follow the “G” reaction. First, the
molecules are treated with dimethyl sulfate, which attaches a methyl group to the purine ring of
G nucleotides. Only a limited amount of dimethyl sulfate is added, the objective being to modify,
on average, just one G residue per polynucleotide. At this stage, the DNA strands are still intact,
cleavage not occurring until a second chemical—piperidine—is added. Piperidine removes the
modified purine ring and cuts the DNA molecule at the phosphodiester bond immediately
upstream of the baseless site that is created. The result is a set of cleaved DNA molecules, some
of which are labeled and some of which are not. The labeled molecules all have one end in
common and one end determined by the cut sites, the latter indicating the positions of the G
nucleotides in the cleaved DNA molecules. Similar approaches are used to generate additional
families of cleaved molecules, though these are usually not simply “A,” “T,” and “C” families,
as problems have been encountered in developing chemical treatments to cut precisely at A or T.
The four reactions that are carried out are, therefore, usually “G,” “A + G,” “C,” and “C + T.”
This complicates things but does not affect the accuracy of the sequence that is determined. The
family of molecules generated in each reaction is loaded into a lane of a polyacrylamide slab gel,
and after electrophoresis, the positions of the bands in the gel are visualized by autoradiography.
The band that has moved the furthest represents the smallest piece of DNA. In the example
shown in Figure 4.8C, this band lies in the “A + G” lane. There is no equivalent-sized band in the
“G” lane, so the first nucleotide in the sequence is “A.” The next size position is occupied by two
bands, one in the “C” lane and one in the “C + T” lane: the second nucleotide is, therefore, “C,”
and the sequence so far is “AC.” The sequence reading can be continued up to the region of the
gel where individual bands were not separated.

Chemical degradation sequencing

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy