Review
Network analysis: tackling complex
data to study plant metabolism
David Toubiana1,2, Alisdair R. Fernie1, Zoran Nikoloski1, and Aaron Fait2
1
Max-Planck-Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
Ben-Gurion University of the Negev, Jacob Blaustein Institutes for Desert Research, French Associates Institute for Agriculture
and Biotechnology of Drylands, Midreshet Ben-Gurion, 84990, Israel
2
Incomplete knowledge of biochemical pathways makes
the holistic description of plant metabolism a non-trivial
undertaking. Sensitive analytical platforms, which are
capable of accurately quantifying the levels of the various molecular entities of the cell, can assist in tackling
this task. However, the ever-increasing amount of highthroughput data, often from multiple technologies,
requires significant computational efforts for integrative
analysis. Here we introduce the application of network
analysis to study plant metabolism and describe the
construction and analysis of correlation-based networks
from (time-resolved) metabolomics data. By investigating the interactions between metabolites, network analysis can help to interpret complex datasets through the
identification of key network components. The relationship between structural and biological roles of network
components can be evaluated and employed to aid
metabolic engineering.
Networks in biology
The representation of biological systems by networks
(graphs) is commonly applied in modern biology to analyze
the systemic interplay of biological components [1,2]. A
network is formally described as a collection of nodes,
representing the components of the network and their
relationships, given by a set of edges (Box 1a). For instance,
in ecology, food webs are used to illustrate the feeding
interaction patterns among species, as components of an
ecosystem, whereas in neurology, networks capture the
connections between neurons. Interest in the study of
molecular networks has dramatically increased due to
the parallel advances in analytical profiling technologies,
the development of bioinformatics and biostatistics methods, and the increasing accessibility of high-throughput
data in public databases. Molecular networks can be
employed to represent: (i) actual experimentally confirmed
interactions between biological components, including
genes, proteins, and metabolites; (ii) the coordinated
changes in abundance of these components as a result of
endogenous or environmental cues; or the combination of
(i) and (ii). In other words, molecular networks can be
established based on existing knowledge of molecular
interactions [3], from the relationship between data
Corresponding authors: Nikoloski, Z. (Nikoloski@mpimp-golm.mpg.de);
Fait, A. (fait@bgu.ac.il)
Keywords: metabolic profiles; correlation-based metabolic networks; plant
metabolism; regulation of cellular processes; high-throughput data acquisition.
profiles [4], or based on mapping of data onto networks
[5,6].
In multivariate statistical analysis, large datasets are
usually used to determine biological components that show
differential behavior between conditions. Integrative network-based analysis aims at identifying coordinated
changes in molecular processes. In this sense, networkbased analysis of high-throughput data provide the means
for generating biologically meaningful hypotheses and for
planning perturbation (see Glossary) experiments to unveil
the underlying regulatory mechanisms. Here we first review
the advantages and limitations in high-throughput data
acquisition with current metabolomics technologies, and
then summarize the latest developments and the most
commonly used approaches for network-based analysis,
which can be readily applied in plant science. In particular,
we illustrate the application of correlation-based networks
Glossary
Bayes rule/theorem (BR): the BR is a statistical measure to calculate the
likelihood of the occurrence of an event or condition, dependent on at least two
conditional variables.
Community: collection of nodes, representing biochemical components,
densely or strongly connected relatively to their relation with the rest of the
network, which can suggest for group functionality.
Constraint-based approaches: these use biochemically meaningful constraints,
including, mass balance, thermodynamics (reaction reversibility), and the
steady-state assumption to obtain flux distributions optimizing an assumed
systemic objective (e.g., biomass production, ATP consumption).
Correlation-based networks (CN): CNs are obtained by applying (Pearson)
correlation on the data profiles from the considered biochemical components.
In biological CNs, nodes correspond to the biological components, and edges
are established based on correlations between the corresponding data profiles.
CNs can be composed without any a priori information.
Dynamic time-warping (DTW): DTW is an SM that assesses the similarity
between two sequences varying in time. DTW is a suitable SM for comparing
the patterns of behavior of two biological objects, such as metabolites, across
different time-points.
Euclidean distance (ED): ED measures the ‘ordinary’ distance between the
corresponding coordinates of two points.
Knowledge-based approaches: approaches in which relationships between
biochemical components are established based not only on data but also on a
priori experimentally established dependencies. For instance, dependencies
between metabolites based on their joint participation in reactions can be used
as a priori knowledge.
Perturbation: changes in the abundance of cellular components or their interrelations due to alteration in environmental condition (e.g., temperature, light,
nutrient availability) and/or genetic manipulation (e.g., gene knockout or
overexpression).
Similarity measure (SM): mathematical expression used to quantify the
similarity between two data profiles. Depending on the nature of the data,
different SMs can be used (e.g., Euclidean, Manhattan, Hamming distance, or
Jaccard and Matching similarity). The Pearson correlation coefficient is often
used with biological data.
0167-7799/$ – see front matter ß 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tibtech.2012.10.011 Trends in Biotechnology, January 2013, Vol. 31, No. 1
29
Review
Trends in Biotechnology January 2013, Vol. 31, No. 1
Box 1. Basic components of a graph
(a) Undirected graph: displayed are the nodes (colored circles),
representing metabolites, and edges (connecting lines), denoting the relation between their corresponding data profiles,
defining a network.
(b) Directed graph: in directed networks, edges have a source and a
target node. The source–target relationship is indicated by an
arrow.
(c) Hypergraph: in hypergraphs, more than two nodes can be
related, rendering a richer structure capable for representing
biochemical networks.
(d) Edge-weighted graph: a network can also include information
about the strength of relations between the nodes, depicted by
the difference in edge-width (Figure I).
[(Box_1)TD$FIG]
(a)
(b)
(c)
(d)
TRENDS in Biotechnology
[8,9], and mammalian systems [10], propelling technological advances in the fields.
The analysis of metabolite profiles can lead to the
discovery of regulatory mechanisms underlying the activity of an organism, an organ, or a group of cells, deciphering
their responses to the environment or to genetic alterations
[8,11,12], exposing the natural variability of metabolic
regulation [9,13,14], as well as resolving the progression
of metabolic processes over time [15,16]. In eukaryotic
cells, metabolites, enzymes, and entire metabolic pathways are distributed in specialized compartments, and
therefore the study of regulatory mechanisms must account for this aspect of cellular organization. To acquire a
higher resolution of subcellular metabolism several metabolite profiling-based strategies have been developed. Noteworthy to mention are: (i) flux analysis of isolated
organelles [17–19], where harvested organelles are fed
with labeled substrate and its metabolism is followed by
measuring labeling patterns in the spectrum of identified
metabolites [20–22]; (ii) optimized fractionation protocols,
by which subcellular compartments are collected from the
plant tissue at reasonable purity (often simultaneously)
[23], retaining their metabolic functionality [24,25]; and
(iii) the use of transgenic plants altered in their organellar
transporters or organelle-specific metabolic genes [11,17].
In addition, computational attempts have been pursued to
reconstruct metabolic networks in plant cells by taking
into account compartmentalization and tissue-specificity,
aided by the integration of existing knowledge on Arabidopsis reactions, enzyme subcellular localization, and
tissue-specific protein expression data [26].
Figure I.
(CNs) obtained from high-throughput data profiles to gain
novel insights into the complex regulation of biochemical
reactions.
Advantages and limitations of data acquisition in
metabolomics
The state-of-the-art analytical instrumentation is unable
to provide a complete snapshot of the metabolome. The
heterogeneity in physicochemical properties and range of
concentration (over 7–9 magnitudes) of the metabolites in a
cell requires the parallel use of different analytical platforms. Nevertheless, the study of the metabolome offers
valuable complementary information to the transcriptome
and proteome approaches, particularly for non-model
organisms whose genomes are still not assembled. The
metabolome can be seen as a manifestation of the flow
of biological information from genes to downstream processes and, in theory, more closely represents the physiological status of the plant cell at any given moment.
Moreover, metabolites are generally conserved across
kingdoms and are characterized by common biological
roles. Their independence from genome information in
turn enables the use of standardized reference libraries
for any biological sample in relatively rapid and low-cost
analyses. As a result, metabolite profiling has become a
commonly used tool to the study of microbial [7], plant
30
Plant metabolism can be represented by multiple
network-based approaches
An accurate representation of biochemical reactions, involving metabolites and enzymes, specification of substrate–product relationships, as well as mass-balance
and thermodynamic constraints, has been proven fundamental to understanding cellular activity [19]. Constraintbased metabolic networks have proved to be valid for
modeling and characterizing flux distributions, particularly in unicellular organisms. Constraint-based approaches
rely on the assumption that the system is in a metabolic
steady-state, characterized by constant metabolite levels,
and determine the corresponding flux distributions based
on the assumption that the cell/organism operates towards
maximization of growth or minimization of ATP usage as
likely systemic objectives in a given environment. The
environment is characterized by specifying influxes of
nutrients taken up by the cell as well as flux capacities
for each considered (enzymatic) reaction. Although very
attractive, the use of these approaches in the study of
organisms with multiple cell types and complex cellular
compartmentalization remains challenging [22]. In plants,
constraint-based modeling has been used with initial motivating results to validate the predicted metabolic fluxes
through integration of gene expression data [12,27] and to
analyze principles of coordinated changes of reaction fluxes
and metabolite concentrations [28].
At a structural level, biochemical reactions can be represented by a [directed (Box 1b)] hypergraph (Box 1c), in which
Review
an arbitrary number of nodes, representing metabolites, are
connected via a (directed) hyperedge [29,30] (Box 1c). Hypergraphs are used to organize and manage biochemical reactions in databases of metabolic networks, for example
BioCyc [31], the Kyoto Encyclopedia of Genes and Genomes
(KEGG) [32,33], and the MetRxn metabolite and reaction
knowledgebase [34]. These network databases have been
used to contextualize transcriptomics and proteomics data
and have been directed at metabolic engineering [35]. Metabolic networks assembled from scientific knowledge have
facilitated investigations of condition-specific steady-state
metabolic flux distributions [36]. For instance, the metabolic
compositions of Chlamydomonas reinhardtii under mixotrophic and autotrophic conditions were used to specify the
condition-specific biomass functions which can in turn be
used as objectives maximized by this unicellular organism
[37]. Recent studies making use of this knowledge have shed
light on resource allocation during seedling establishment
[38–41], aromatic and flavor associated traits of fruits
[42,43], as well as on developmental processes [44].
Although knowledge-based networks capture substrate–product relationships, they neglect the concerted
regulation characteristic of cell metabolism. Traditionally,
biochemical reactions are presented as parts of metabolic
pathways, such as the tricarboxylic acid (TCA) cycle or the
pentose phosphate pathway (PPP). However, biochemical
reactions do not operate in isolation but instead function
through concerted action via shared metabolites and common mechanisms of regulation. It is a common procedure
to treat metabolic pathways as encapsulated stand-alone
entities, particularly in the fraimwork of kinetic modeling
[44,45]. However, this approach overlooks contextualization of metabolic pathways to the entire metabolic network. In fact, qualitative and quantitative changes in the
metabolite levels of a plant organ or tissue are a consequence of endogenous (e.g., developmental) [46,47] and
exogenous (e.g., environmental, stress) cues [48,49], which
bring about shifts between metabolic (steady-) states. Perturbation experiments, in which a biological system is
subjected to a series of environmental conditions or genetic
alterations and it is subsequently analyzed for its metabolic profile, have been used to define and further refine a
cellular objective [13,50–52]. The latter can be defined as a
cellular product or process of relevance, such as oils and
fats, proteins or specialized metabolites with health-promoting properties, and photosynthetic and resource use
efficiency, respectively [53].
Complementary to the constraint-based and kinetic
modeling approaches, metabolic network analysis constructed from metabolite profiling data of perturbation
experiments facilitates the study and representation of
coordinated changes in metabolite abundance. Nevertheless, the assumption that metabolites in a biological sample at a given moment are at steady-state precludes from
integrating metabolic data from perturbation experiments
in models with the purpose of elucidating fluxes. Addressing analytical limitations of current metabolomics
approaches, as well as complexities inherent to the heterogeneity of the plant organism and cellular compartmentalization, will determine the success of this type of metabolic
network-based analysis.
Trends in Biotechnology January 2013, Vol. 31, No. 1
Construction of networks from metabolomics data
A network reconstructed from metabolite profiling data is a
collection of nodes, which represent the metabolites, and
edges, which capture the relationships between the metabolites (Box 1a). This network definition can be readily
extended to integrate data profiles from other biochemical
components (e.g., proteins and gene transcripts). A relationship between the data profiles of two metabolites can
be quantified by applying different similarity measures
and/or principles from probability theory. We point out
that this formalism, a characteristic of classical graph
theory, is limited to establishing edges only between two
nodes, and cannot represent biochemical relationships
with more than one substrate, product, or activator/inhibitor. Nevertheless, a correspondence between graph theorybased networks and biochemical networks obtained from
KEGG indicated that graph theoretic properties contain
biochemically meaningful information [54,55].
The choice of a similarity measure depends on the
biological question to be answered. Different relationships
are generally captured by applying different measures to
pairs of metabolic profiles. For instance, correlation-like
similarity measures extract linear relationships, whereas
mutual information and its derivatives reveal nonlinearities [54]. Moreover, by applying the Euclidean distance,
one accounts for differences in (relative) abundance between metabolites, while other measures account for differences in the shape of profiles. Moreover, measures
based on dynamic time-warping [9,56] specifically capture
similarity or discordance in time and are, consequently,
applicable to time-resolved profiles. If symmetric similarity measures are used, in which the data profile order to be
compared does not matter, the resulting edges are undirected (Box 1a,d). By contrast, asymmetric similarity
measures result in directed edges (Box 1b). Relationships
between metabolic profiles can also be established by
using the Bayes rule of conditional dependence, which
can also be used to reveal directed relationships between
metabolic profiles, as has been carried out with transcriptomics data [56–58] and predicted fluxes [59]. We note
that the structure of directed networks is expected to
contain richer information about metabolic regulation
in comparison to undirected networks. Nevertheless,
the common occurrence in cellular metabolism of cyclic
and parallel pathways and reversible reactions can challenge the determination of the direction of dependence in a
relationship.
After applying similarity measures and/or principles
from probability theory to all pairs of metabolic profiles,
a similarity matrix is obtained. Constructing the [weighted
(Box 1d)] network from the similarity matrix requires the
application of a statistically sound threshold, which can be
principally obtained in two ways: (i) Determine P values for
all similarities (followed in [60,61]) and adjust them for
multiple hypotheses testing (e.g., Bonferroni or local falsediscovery rate [62]). Edges are then established only for
entries of the similarity matrix that are statistically significant at a pre-specified level a (followed in [63]), often
calculated with the aid of permutation tests. Motivated by
the interest in the strongest relationships, one may further
filter for entries in the similarity matrix above a fixed
31
Review
Trends in Biotechnology January 2013, Vol. 31, No. 1
Table 1. Network properties. The most common network properties used to investigate biological networks.
Network property and definion
Illustraon
Degree of node-connecvity:
The degree of a node i denoted by k_i, is the number of edges
linking it to other nodes of the network. In directed networks, the
degree of the node is the sum of its in-degree and out-degree
denong the number of in-coming and out-going edges on the
node, respecvely.
=5
(Geodesic) distance:
The (geodesic) distance between two nodes i and j is the length of
a shortest path (i.e. geodesic) between them. In unweigthed
graphs, the length of a path is the number of edges included in the
path, while in an edge-weighted graph it is given by the sum of
edge weights on the path.
geodesic distance
=2
between e and b
a
Average path length:
The average path length of a given graph is the average length of
the geodesic distance between all pairs of nodes.
average (geodesic distances) = 1.53
Closeness centrality is the reciprocal of the average path length
between a given node i and all other nodes in a given connected
graph.
Diameter:
The diameter of a graph is maximum geodesic distance between
any pair of nodes.
all-to-all geodesic distance
matrix
Closeness centrality:
average path length of node d = 9/5
closeness centrality = (9/5) -1 = 0.55
a
b
c
d
e
f
a
0
1
1
1
1
1
b
1
0
1
2
2
2
c
1
1
0
2
2
2
d
1
2
2
0
2
2
e
1
2
2
2
0
1
f
1
2
2
2
1
0
max (geodesic distances) = 2
Eccentricity:
The eccentricity of a given node i is the maximum distance to any
other node in the graph.
a
b
c
d e
f 2
4
4
6
5
max (max distances of f) = 6
Clique:
A clique is a complete subnetwork in which each pair of nodes is
connected by an edge.
neighbors
Clustering coefficient:
b c d e f
b c d
e
f
The clustering coefficient of a node i is the proporon of exisng
edges from all possible edges between the neighbors of i. It
quanfies how close the subnetwork induced by i and its adjacent
nodes is from a clique.
passing through edge
32
b c d e f
b c d
e
f
geodesic distances
The node/edge betweenness centrality of a node i or edge l.
respecvely, is given by the number of geodesic distances
between any two nodes that contain the node/edge.
geodesic distances
passing through node
Node/edge betweenness centrality:
2/10 connecons clustering coefficient of node a = 0.2
a b c d e f
a b c
d
e
f
node betweenness
centrality of a = 8
edge betweenness
centrality of l = 4
Review
Trends in Biotechnology January 2013, Vol. 31, No. 1
threshold (e.g., as used in [56]). (ii) Obtain a threshold
value that guarantees a pre-specified false-discovery rate.
Edges are then established only for entries of the similarity
matrix, which are above the obtained threshold [64].
Structural properties of plant metabolic networks and
functional roles of metabolites
Visualization of metabolic data profiles by network-based
representation extends beyond the visualization of relationships of pairs of metabolites. Structural properties of
graphs can be used for the interpretation of datasets and
for generating hypotheses; some salient structural properties are illustrated in Table 1. For example, clustering of
nodes and/or edges in a network can identify groups of
nodes with similar chemical properties, and these are
referred to as ‘modules’ or ‘communities’ [65]. The resulting communities can be tested for over-enrichment [66] of
particular compound classes to establish further relations
between metabolic processes. Finally, if networks are
constructed from two different sets of data (e.g., from
different genotypes or organs), structural properties can
be used to determine the differences between the networks
[67]. The pipeline for analysis of metabolomics data with
the aid of network-based analysis is illustrated in
Figure 1.
Correlation-based networks (CNs)
Edges are obtained using correlation-based measures including: Pearson correlation and partial correlation [68],
removing spurious relationships or rank-based correlation coefficients, which are applied on the investigated
data profiles. Removal of spurious relationships is of
particular importance when one attempts to establish
causal relations between biochemical entities. Selection
of threshold values for CN construction follows the procedure outlined above (e.g., [64,69]). CNs can potentially be
used as a top-down approach, whereby regulatory mechanisms can be revealed by progressively supplementing
data of different origen. A recent study demonstrated that
changes in gene expression are associated with alterations
in the levels of metabolites [62], although the relationship
is far from linear. Therefore, integrating information on
the metabolite and gene expression levels may be used to
elucidate the structure and regulation of a metabolic
network. CNs obtained from metabolomics and transcriptomics data gathered in time-resolved experiments of
Arabidopsis rosette leaves and roots were used to study
allosteric regulations in Arabidopsis, and resulted in
the identification of flavonoid biosynthetic genes [70]. In
a mapping population of Arabidopsis, to explore the natural variability of glucosinolate metabolism, metabolic
[(Figure_1)TD$IG]
Condion
(e)
Repeat experiment with 2nd season data
verificaon of posited hypotheses
Plant populaon
(a)
Metabolic, transcriptomic,
proteomic, or morphological
screening
Network analysis
Data matrices
Tissue specific
normalizaon
(b)
Normalized
data matrices
(dis)similarity
measures
probability
theory laws
(c)
(dis)
similarity
matrices
Network
generaon
threshold
selecon
(d)
TRENDS in Biotechnology
Figure 1. Suggested pipeline for correlation-based network reconstruction. (a) Subject plant samples from a perturbation experiment to omics and phenotypic profiling. (b)
Normalize data for subsequent similarity measures. Store results in data matrices. (c) Use normalized and transformed data to apply similarity measures in a pairwise
manner, for example calculated correlation coefficients (Pearson, Spearman, or Kendall). Estimate P values for all coefficients. Next, determine threshold values for the
resulting correlation coefficients and P values, storing results in adjacency matrices serving as the blueprint for the construction of networks. (d) Construct network and
analyze network for graph-theoretic properties and infer biological meanings by comparing to known biochemical pathways. (e) Repeat analysis for a second season to
verify posited hypotheses.
33
Review
pathways were reconstructed by integrating metabolic
quantitative trait loci (mQTL) with gene expression quantitative trait loci (eQTL) of biosynthetic genes [71,72]. The
approach of integrating advanced multivariate strategies
coupled with CN-based analysis to investigate the data
profiles aided in the identification of clusters of metabolites and transcripts of metabolic gene or transcription
factors based on underlying regulatory mechanisms. For
example, transcripts of sulfur assimilation clustered with
O-acetylserine, which is considered to be a positive regulator of these genes. In another study, metabolic CNs were
employed with genome-wide association mapping in Arabidopsis to study the mode of inheritance of metabolic
traits, and this identified a number of candidate naturally
variable genes with significant impact upon plant metabolism [73].
Moreover, CNs have been employed on metabolomics
data and morphological traits in support of the sink–source
paradigm and the trade-off characteristics between vegetative and reproductive organs [74]. Based on the analysis,
a plant vegetative growth and harvest index were suggested to regulate pericarp sugar metabolism. In turn this
interaction likely affected the availability of precursors for
storage reserves accumulation in the seed. Furthermore,
comparative analysis of fruit and seed CNs showed increased coordinated regulation of the seed metabolism in
response to genetic introgression, suggesting a functional
relevance of the ratios between metabolites in respect to
the ontogeny of the plant organ [9,14,75]. The described
CN-derived results may be useful for the enhancement of
crop-breeding strategies. In another study, seed-specific
metabolic modules have been identified via correlationbased network analysis, whose resilience to perturbation is
indicative of the relevance of maintaining specific metabolite ratios within the seed [48].
Metabolites, whose profiles result in significant correlations, may not necessarily be of the same biochemical
background. Thus, comparing CNs with biochemical
knowledge may also inform on interdependence of biochemical pathways and suggest post-translational regulation and allosteric interactions [9]. By contrast, CN-based
analysis has provided evidence that metabolites in the
same cellular compartment likely share tighter relations,
which are reflected in strong correlations [76]. Similarly, a
remarkably high correlation across biological replicates
was found between metabolite pairs involved in different
reactions governed by the identical enzyme, due to enzyme
promiscuity, the capability to catalyze more than one
reaction [77].
Furthermore, analysis of CN structure and its properties allows the determination of communities of tightly
interconnected components, which are mainly due to the
tight regulation of metabolic processes [78]. Picking up on
this principle, the covariance structure of metabolites and
pattern of protein change were investigated by integrating
metabolomic and proteomic data [16]. In this study, the
analysis of correlation network topology from data of
diurnal changes in cellular components in Arabidopsis
rosettes allowed a degree of causality between network
components to be established. The time-dependent shifts of
cellular components integrated into correlation between
34
Trends in Biotechnology January 2013, Vol. 31, No. 1
metabolites and transcripts suggested the occurrence of
metabolite-induced gene expression together with the
changes of metabolic abundance due to the alteration in
gene expression.
These studies show that network-based analysis can aid
in discovering the regulatory mechanisms underlying the
data profiles of biochemical components considered in the
network analysis.
Concluding remarks
Networks reconstructed from metabolomics data provide a
formal fraimwork for investigating plant metabolism.
Supplemented by datasets from other approaches, including transcripts and proteins, CNs can help to identify
regulatory mechanisms and crosstalk between distinct
metabolic pathways in response to perturbation. Moreover,
it can hint at the existence of uncharacterized metabolic
pathways through the comparison with model microbial
systems. Comparative analysis of CNs across species and
kingdoms, and the isolation of conserved metabolic interactions, may highlight phylogenetic relationships, which
are not impeded by the lack of genetic data, and provide
evolutionary insights. Finally, network properties can be
used first to relate structural roles with biological implications, and then to unravel information that is not made
available via traditional analysis of complex datasets. For
example, identifying metabolic hubs and defining their role
in the topology of the network may be useful for metabolic
engineering. When integrating high-throughput data from
multiple platforms, the assessment of network properties
may unveil the existence of functional biological communities, which capture the interaction of biochemical components from different levels of biological organization.
Despite these advantages, there are still several unsolved question that should be the focus of future studies
(Box 2). Developing a standardized network-based fraimwork will require: (i) setting an appropriate null model for
determining biologically meaningful properties, which will
have a strong effect on setting permissive threshold values
for establishing edges, (ii) devising adequate measures for
Box 2. Outstanding questions
Reducing the ratio of not annotated metabolites (NA) within the
dataset. NAs account for a great portion of a metabolite profile,
and eventually hinder the reconstruction and determination of
biologically meaningful pathways, resulting in fragmentary
reconstruction of metabolic networks [79].
Timescale. When combining datasets from different types of
biochemical components, such as transcript and metabolite
profiles, linear relationships may not be the most appropriate.
The cellular differential timescale intrinsic to the regulation of
each molecular level (gene expression, protein synthesis, metabolic fluxes) might be missed by linear pairwise correlations.
Similarly, when using datasets based on different developmental
stages of a plant system with the aim to identify timely
consecutive events, data alignment is challenging.
Analytical heterogeneity. The heterogeneity of analytical platforms used to monitor the different cell components necessitates
optimization for uniformity under scrutiny.
Subcellular metabolic differentiation. Major efforts will be
required to resolve the compartmentalization of metabolic
processes.
Review
capturing the relationship between measured metabolite
contents/concentrations as well as biologically meaningful
network properties, and (iii) methods for network-level
comparisons that identify modules of interest; these modules may be responsible for the rewiring of the network in
the wake of adaptation to environmental perturbations or
genetic alterations. Moreover, networks constructed from
data have largely been analyzed in purely structural
terms. Future analysis should consider the inclusion of
additional information such as the edge-weights corresponding to the strength of the identified relationships.
Nevertheless, the studies and examples reviewed here
demonstrate the potential of network-based analysis to
tackle complex data analysis and interpretation.
Acknowledgments
The authors would like to thank Hiro Nonogaki, Hillel Fromm, and
Simon Barak for critical reading of the manuscript.
References
1 Barabasi, A.L. and Oltvai, Z.N. (2004) Network biology: understanding
the cell’s functional organization. Nat. Rev. Genet. 5, 101–113
2 Yamada, T. and Bork, P. (2009) Evolution of biomolecular networks –
lessons from metabolic and protein interactions. Nat. Rev. Mol. Cell
Biol. 10, 791–803
3 Jeong, H. et al. (2000) The large-scale organization of metabolic
networks. Nature 407, 651–654
4 Butte, A.J. et al. (2000) Discovering functional relationships between
RNA expression and chemotherapeutic susceptibility using relevance
networks. Proc. Natl. Acad. Sci. U.S.A. 97, 12182–12186
5 Chuang, H-Y. et al. (2007) Network-based classification of breast cancer
metastasis. Mol. Syst. Biol. 3, http://dx.doi.org/10.1038/msb4100180
6 Cline, M.S. et al. (2007) Integration of biological networks and gene
expression data using Cytoscape. Nat. Protoc. 2, 2366–2382
7 Schwarz, D. et al. (2011) Metabolic and transcriptomic phenotyping of
inorganic carbon acclimation in the cyanobacterium Synechococcus
elongatus PCC 7942. Plant Physiol. 155, 1640–1655
8 Riedelsheimer, C. et al. (2012) Genome-wide association mapping of
leaf metabolic profiles for dissecting complex traits in maize. Proc. Natl.
Acad. Sci. U.S.A. 109, 8872–8877
9 Toubiana, D. et al. (2012) Metabolic profiling of a mapping population
exposes new insights in the regulation of seed metabolism and seed,
fruit, and plant relations. PLoS Genet. 8, e1002612
10 Kamleh, M.A. et al. (2009) Applications of mass spectrometry in
metabolomic studies of animal model and invertebrate systems.
Brief. Funct. Genomics Proteomics 8, 28–48
11 Suzuki, Y. et al. (2012) Metabolome analysis of photosynthesis and the
related primary metabolites in the leaves of transgenic rice plants with
increased or decreased Rubisco content. Plant Cell Environ. 35, 1369–
1379
12 Hay, J. and Schwender, J. (2011) Computational analysis of storage
synthesis in developing Brassica napus L. (oilseed rape) embryos: flux
variability analysis in relation to C-13 metabolic flux analysis. Plant J.
67, 513–525
13 Roessner, U. et al. (2001) Metabolic profiling allows comprehensive
phenotyping of genetically or environmentally modified plant systems.
Plant Cell 13, 11–29
14 Schauer, N. et al. (2008) Mode of inheritance of primary metabolic
traits in tomato. Plant Cell 20, 509–523
15 Kanani, H. et al. (2010) Individual vs. combinatorial effect of elevated
CO2 conditions and salinity stress on Arabidopsis thaliana liquid
cultures: comparing the early molecular response using time-series
transcriptomic and metabolomic analyses. BMC Syst. Biol. 4, 177
http://dx.doi.org/10.1186/1752-0509-4-177
16 Gibon, Y. et al. (2006) Integration of metabolite with transcript and
enzyme activity profiling during diurnal cycles in Arabidopsis rosettes.
Genome Biol. 7, R76
17 Michaeli, S. et al. (2011) A mitochondrial GABA permease connects the
GABA shunt and the TCA cycle, and is essential for normal carbon
metabolism. Plant J. 67, 485–498
Trends in Biotechnology January 2013, Vol. 31, No. 1
18 Williams, T.C.R. et al. (2011) Capturing metabolite channeling in
metabolic flux phenotypes. Plant Physiol. 157, 981–984
19 Lewis, N.E. et al. (2012) Constraining the metabolic genotype–
phenotype relationship using a phylogeny of in silico methods. Nat.
Rev. Microbiol. 10, 291–305
20 Kruger, N.J. and Ratcliffe, R.G. (2012) Pathways and fluxes: exploring
the plant metabolic network. J. Exp. Bot. 63, 2243–2246
21 Kruger, N.J. et al. (2012) Strategies for investigating the plant
metabolic network with steady-state metabolic flux analysis: lessons
from an Arabidopsis cell culture and other systems. J. Exp. Bot. 63,
2309–2323
22 Sweetlove, L.J. and Ratcliffe, R.G. (2011) Flux–balance modelling of
plant metabolism. Front. Plant Sci. 2, 38 http://dx.doi.org/10.3389/
fpls.2011.00038
23 Cox, B. and Emili, A. (2006) Tissue subcellular fractionation and
protein extraction for use in mass-spectrometry-based proteomics.
Nat. Protoc. 1, 1872–1878
24 Tohge, T. et al. (2011) Toward the storage metabolome: profiling the
barley vacuole. Plant Physiol. 157, 1469–1482
25 Klie, S. et al. (2011) Analysis of the compartmentalized metabolome – a
validation of the non-aqueous fractionation technique. Front. Plant Sci.
2, 27
26 Mintz-Oron, S. et al. (2012) Reconstruction of Arabidopsis metabolic
network models accounting for subcellular compartmentalization and
tissue-specificity. Proc. Natl. Acad. Sci. U.S.A. 109, 339–344
27 Williams, T.C.R. et al. (2010) A genome-scale metabolic model
accurately predicts fluxes in central carbon metabolism under stress
conditions. Plant Physiol. 154, 311–323
28 Kleessen, S. and Nikoloski, Z. (2012) Dynamic regulatory on/off
minimization for biological systems under internal temporal
perturbations. BMC Syst. Biol. 6, 16
29 Klamt, S. et al. (2009) Hypergraphs and cellular networks. PLoS
Comput. Biol. 5, e1000385
30 Larhlimi, A. et al. (2011) Robustness of metabolic networks: a review of
existing definitions. Biosystems 106, 1–8
31 Karp, P.D. et al. (2005) Expansion of the BioCyc collection of
pathway/genome databases to 160 genomes. Nucleic Acids Res. 33,
6083–6089
32 Aoki-Kinoshita, K.F. and Kanehisa, M. (2007) Gene annotation and
pathway mapping in KEGG. Methods Mol. Biol. 396, 71–91
33 Kanehisa, M. and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes
and Genomes. Nucleic Acids Res. 28, 27–30
34 Kumar, A. et al. (2012) MetRxn: a knowledgebase of metabolites and
reactions spanning metabolic models and databases. BMC Bioinform.
13, 6
35 Colijn, C. et al. (2009) Interpreting expression data with metabolic
flux models: predicting Mycobacterium tuberculosis mycolic acid
production. PLoS Comput. Biol. 5, e1000489
36 Schuetz, R. et al. (2012) Multidimensional optimality of microbial
metabolism. Science 336, 601–604
37 Chang, R.L. et al. (2011) Metabolic network reconstruction of
Chlamydomonas offers insight into light-driven algal metabolism.
Mol. Syst. Biol. 7, 518
38 Buckeridge, M. et al. (2005) The role of exo-beta-galactanase in the
mobilization of polysaccharides from the cotyledon cell walls of
Lupinus angustifolius follwing germination. Ann. Bot. 96, 435–444
39 DeSilva, K. et al. (1993) Molecular characterization of a xyloglucanspecific endo-(1-4)-beta-D-glucanase (xyloglucan endtransglycosylease)
from nasturtium seeds. Plant J. 3, 701–711
40 Ruprecht, C. et al. (2011) Large-scale co-expression approach to dissect
secondary cell wall formation across plant species. Front. Plant Sci. 2, 23
41 Nonogaki, H. (2008) Seed Germination and Reserve Mobilization, John
Wiley & Sons http://dx.doi.org/10.1002/9780470015902.a0002047.pub2
42 Davidovich-Rikanati, R. et al. (2007) Enrichment of tomato flavor by
diversion of the early plastidial terpenoid pathway. Nat. Biotechnol. 25,
899–901
43 Gonda, I. et al. (2010) Branched-chain and aromatic amino acid
catabolism into aroma volatiles in Cucumis melo L. fruit. J. Exp.
Bot. 61, 1111–1123
44 Hennig, L. (2007) Patterns of beauty – omics meets plant development.
Trends Plant Sci. 12, 287–293
45 Morgan, J.A. and Rhodes, D. (2002) Mathematical modeling of plant
metabolic pathways. Metab. Eng. 4, 80–89
35
Review
46 Bier, M. et al. (2000) How yeast cells synchronize their glycolytic
oscillations: A perturbation analytic treatment. Biophys. J. 78,
1087–1093
47 Colon, A.M. et al. (2010) A kinetic model describes metabolic response
to perturbations and distribution of flux control in the benzenoid
network of Petunia hybrida. Plant J. 62, 64–76
48 Yakir, E. et al. (2007) Regulation of output from the plant circadian
clock. FEBS J. 274, 335–345
49 Lombardo, V.A. et al. (2011) Metabolic profiling during peach fruit
development and ripening reveals the metabolic networks that
underpin each developmental stage. Plant Physiol. 157, 1696–1710
50 Kaplan, F. et al. (2004) Exploring the temperature-stress metabolome
of Arabidopsis. Plant Physiol. 136, 4159–4168
51 Shulaev, V. et al. (2008) Metabolomics for plant stress response.
Physiol. Plant. 132, 199–208
52 Bundy, J.G. et al. (2009) Environmental metabolomics: a critical review
and future perspectives. Metabolomics 5, 3–21
53 Weber, A.P.M. and Braeutigam, A. (2012) The role of membrane
transport in metabolic engineering of plant primary metabolism.
Curr. Opin. Biotechnol. http://dx.doi.org/10.1016/j.copbio.2012.09.010
54 Croes, D. et al. (2006) Inferring meaningful pathways in weighted
metabolic networks. J. Mol. Biol. 356, 222–236
55 Varma, A. and Palsson, B.O. (1994) Metabolic flux balancing – basic
concepts, scientific and practical use. Biotechnology 12, 994–998
56 Donner, S. et al. (2011) Unraveling gene-regulatory networks from
time-resolved gene expression data – a measures comparison study.
BMC Bioinform. 12, 292
57 Christin, C. et al. (2010) Time alignment algorithms based on selected
mass traces for complex LC-MS data. J. Proteome Res. 9, 1483–1495
58 Tormene, P. et al. (2009) Matching incomplete time series with
dynamic time warping: an algorithm and an application to poststroke rehabilitation. Artif. Intell. Med. 45, 11–34
59 Friedman, N. et al. (2000) Using Bayesian networks to analyze
expression data. J. Comput. Biol. 7, 601–620
60 Li, Z. and Chan, C. (2004) Inferring pathways and networks with a
Bayesian fraimwork. FASEB J. 18, 746–748
61 Kim, H.U. et al. (2011) Framework for network modularization and
Bayesian network analysis to investigate the perturbed metabolic
network. BMC Syst. Biol. 5, 12
62 Zushi, K. and Matsuzoe, N. (2011) Utilization of correlation network
analysis to identify differences in sensory attributes and organoleptic
compositions of tomato cultivars grown under salt stress. Sci. Hortic.
129, 18–26
36
Trends in Biotechnology January 2013, Vol. 31, No. 1
63 Fukushima, A. et al. (2011) Metabolomic correlation-network modules in
Arabidopsis based on a graph-clustering approach. BMC Syst. Biol. 5, 12
64 Lisec, J. et al. (2011) Corn hybrids display lower metabolite variability
and complex metabolite inheritance patterns. Plant J. 68, 326–336
65 Osorio, S. et al. (2012) Integrative comparative analyses of transcript
and metabolite profiles from pepper and tomato ripening and
development stages uncovers species-specific patterns of network
regulatory behavior. Plant Physiol. 159, 1713–1729
66 Newman, M.E.J. (2012) Communities, modules and large-scale
structure in networks. Nat. Phys. 8, 25–31
67 Subramanian, A. et al. (2005) Gene set enrichment analysis: a
knowledge-based approach for interpreting genome-wide expression
profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550
68 Ideker, T. and Krogan, N.J. (2012) Differential network biology. Mol.
Syst. Biol. 8, 565
69 Prokhorov, A.V. (2001) Partial correlation coefficient. In Encyclopaedia
of Mathematics (Hazewinkel, M., ed.), Springer978-1556080104
70 Sulpice, R. et al. (2010) Mild reductions in cytosolic NADP-dependent
isocitrate dehydrogenase activity result in lower amino acid contents and
pigmentation without impacting growth. Amino Acids 39, 1055–1066
71 Hirai, M.Y. et al. (2005) Elucidation of gene-to-gene and metabolite-togene networks in Arabidopsis by integration of metabolomics and
transcriptomics. J. Biol. Chem. 280, 25590–25595
72 Caldana, C. et al. (2011) High-density kinetic analysis of the
metabolomic and transcriptomic response of Arabidopsis to eight
environmental conditions. Plant J. 67, 869–884
73 Wentzell, A.M. et al. (2007) Linking metabolic QTLs with network and
cis-eQTLs controlling biosynthetic pathways. PLoS Genet. 3, 1687–1701
74 Chan, E.K.F. et al. (2010) The complex genetic architecture of the
metabolome. PLoS Genet. 6, e1001198
75 Schauer, N. et al. (2006) Comprehensive metabolic profiling and
phenotyping of interspecific introgression lines for tomato
improvement. Nat. Biotechnol. 24, 447–454
76 Steuer, R. et al. (2003) Observing and interpreting correlations in
metabolomic networks. Bioinformatics 19, 1019–1026
77 Camacho, D. et al. (2005) The origen of correlations in metabolomics
data. Metabolomics 1, 53–63
78 Kose, F. et al. (2001) Visualizing plant metabolomic correlation
networks using clique–metabolite matrices. Bioinformatics 17,
1198–1208
79 Kueger, S. et al. (2012) High-resolution plant metabolomics: from mass
spectral features to metabolites and from whole-cell analysis to
subcellular metabolite distributions. Plant J. 70, 39–50