2021 Null
2021 Null
2021 Null
ARTICLE
Abstract: In this manuscript we discuss the purpose and scope of biometry and
its interactions, complementation, and even overlap with several other areas
of genetics. We emphasize that biometry is an area of genetics that enables
researchers to analyze, process, and interpret biological phenomena from data,
usually obtained from experimental tests, in an improved way to guide strate-
gies and decision making for optimization of resources. We also highlight the
importance of the biometry professional in the context of breeding and the need
for continual training, due to new demands for and challenges from inclusion of
different types of information for processing and analysis paradigms to better
interpret these paradigms.
Keywords: Biometry, genetics, breeding, data analysis, quantitative genetics.
INTRODUCTION
Challenges have arisen from an increase in the demand for food because of
population growth and difficulties in increasing production through the costs
and adversities of climate change. These challenges are quite evident and have
required the use of extensive technical, human, and financial resources in the
attempt to overcome them. It is necessary to produce greater amounts of food
with quality, at low cost, and under increasingly diverse environmental conditions.
Breeding has a long history and was for many years practiced by ancestors
who had the ability and art of identifying individuals of superior performance,
and they passed traits on to new crops through the descendants of these
individuals. The success of this strategy certainly depended on the ability of
those dedicated and competent farmers who often adopted subjective and not
very accurate criteria in their choices, based on what was pleasing to the eye
and had good flavor. As time went on, breeding became more intensive and
included the need for always overcoming barriers and achieving new levels of
production (Allard 1971). This purpose required an intense effort of educating
professionals able to exercise the activity of breeders and act in public and *Corresponding author:
private institutions exclusively dedicated to the activity of breeding. Choosing E-mail: cdcruz@ufv.br
good genotypes was no longer an amateur activity but became a professional ORCID: 0000-0003-3513-3391
one; it was no longer exclusively an art, but came to be a science formed within
Received: 26 April 2021
the principles of genetics and experimentation. Observation was subjected to Accepted: 10 May 2021
the critique of accuracy and suitability for making decisions. Published: 06 July 2021
From Mendel’s laws, we discovered how the factors responsible for inheritance 1
Universidade Federal de Viçosa (UFV), De-
of the simplest traits were transmitted; however, many questions still remained partamento de Biologia Geral, Campus UFV,
regarding the mechanism of inheritance of more complex characteristics, such 36.570-900, Viçosa, MG, Brazil
as those involved in production of meat, milk, grain, or fruit, among other products. Therefore, a specific area in genetics
arose to deal with this matter, which was quantitative genetics, that is, an area that deals with inheritance and variation
of quantitative traits, which are strongly affected by the environment (Hallauer and Miranda Filho 1988, Vencovsky 1987,
Vencovsky and Barriga 1992, Falconer and Mackay 1996). This area was the precursor of biometry.
From the postulates of quantitative genetics, we began to understand how genetic and environmental factors interact
in control of these complex traits, and we came to understand the enormous difficulty faced by breeders in the process
of choosing and discarding genetic material. These choices are based on prediction of real genetic values of individuals
in competition, whether through the action of the environment, the genotype × environment interaction, or even
some genetic effects that hinder the selection process, such as dominance and epistasis effects. These factors hinder
establishing a good relationship between good performance of an individual (plant or animal) and the good quality of
their gametes, which are the biological links passed from one generation to another.
Quantitative genetics has made an enormous contribution, generating valuable information placed at the service of
breeders to obtain superior plants and animals in a shorter time and with fewer requirements of physical and financial
resources. All this information was obtained from costly experimental trials, conducted in a careful manner, which always
adopted and observed the scientific criteria of experimentation (Ramalho et al. 1993, Souza Júnior 2001, Oliveira et
al. 2005). The large amount of data and the need for more appropriate analyses of this information came to require a
new area of science, which meant the need for a new knowledge concentration area and a new contingent of qualified
professionals. This new area is called “biometry”, and the professionals acting in this area are called “biostatisticians”.
Within this perspective, many groups of researchers perceived that they could provide a more robust format to the
area of biometry, effectively contributing to dissemination of knowledge and training of professionals in the area, leading
to scientific advances from the proposal and refinement of quantitative methods for data analysis and interpretation
of genetic parameters.
We now understand biometry as the area of genetics that allows analysis, processing, and interpretation of biological
phenomena from data, generally obtained from experimental trials in a more refined manner, to direct strategies and
decision making for optimization of resources.
In the process of data analysis, the biometric area presents and refines models (in the case of statistical procedures)
and computational architecture (in the case of computational intelligence approaches) for better data processing. By
the action of data processing, biometry deepens studies on effective computational algorithms that make the use of
quantitative methods viable and generate information useful for interpretation of results. Because of its interpretive
activity, biometry requires knowledge of all biological areas, especially quantitative genetics, allowing understanding of
biotic and abiotic factors that affect the phenomenon studied and directing the choice of better strategies and decisions
aiming at optimal use of physical, financial, and human resources.
because of technical, personnel, and financial problems, and it may only provide good results if the genetic material
available is improved in a way to be able to express its full potential within the adequate environmental conditions. In
general, the researcher should be concerned with the combination of environmental and genetic improvement.
Professionals need to have a holistic view regarding breeding and environmental balance. Competent and qualified
actions depend on aggregating scientifically based knowledge with practical experience and a global vision that will lead
to satisfactory results. An understanding of the trait under selection and the instruments of generating information that
will guide decision making is fundamental. In this context, training professionals in genetic areas, especially quantitative
genetics and biometry, is indispensable.
Quantitative genetics is the part of genetics that studies quantitative traits, emphasizing their inheritance and the
components that determine their variation. Quantitative traits are generally controlled by various genes and are highly
affected by the environment, thus exhibiting continuous (and sometimes discontinuous) variations. Qualitative traits,
however, have simpler inheritance (conditioned by one or few genes) and are little or not affected by the environment
(Falconer and Mackay 1996). Biometry is complementary, because it pervades areas such as modeling, experimentation,
and computational processing to generate information that, illuminated by the theories of quantitative genetics, allows
interpretation of phenomena and decision making.
The study of inheritance and of variation in qualitative traits is based on analysis of generations, separating individuals
in classes and evaluating their proportions in the results of determined crosses. Nevertheless, information on the individual
is not of great value in the study of inheritance of quantitative traits, due to the random effect of the environment. If
the effect of the environment can both increase and diminish the phenotypic manifestation of a trait, the mean value
of a set of individuals will be a more reliable measurement, because the effects of the environment tend to cancel each
other out. Thus, quantitative traits are studied at the population level. In addition to the mean, another measure used
to define a population is the variance. Therefore, in studying quantitative traits, we evaluate which fractions of the
mean and of the variance are inheritable.
Measurements of central position and of dispersion are routinely dealt with in statistical analyses; however,
understanding regarding the mean and genotypic variance is a matter of great importance in the context of both
quantitative genetics and biometry. In quantitative genetics, we seek to understand and interpret the meaning of values
from this information for the breeder, whereas in biometry, we seek to establish the models that allow better estimation
of these values based on experimentation and observance of the laws of genetics.
We know, in a very simple way, that the mean represents the sum of all observations divided by their total number. In
some areas, this is sufficient. In quantitative genetics, the concern is not necessarily knowing how to estimate the mean,
but understanding its value when obtained in whatever population or in a population in Hardy-Weinberg equilibrium.
Breeders have clear expectations of the consequences regarding a mean derived from self-fertilization of a population
or regarding a cross between populations. Thus, important concepts emerge regarding inbreeding depression and
regarding heterosis that transcend the process of obtaining such estimates.
Understanding of the genotypic variation of a population calls attention to the existence of two basic models of
genetics. In one model, the genotypic value of an individual is predicted by the quantity (2, 1, or 0 alleles) of a favorable
allele found in the individual. In another model, this genotypic value refers to the consequence of the union of maternal
and paternal gametes, with different genetic information, that combine in fertilization. The decomposition of this
genotypic variation in additive variation and variation due to dominance in a monogenic model, and an epistatic variation
component when considering more than one locus, is fundamental for directing breeding programs based on sexual
recombination of the selected individuals and predicting the heterotic potential of hybrid combinations between superior
and divergent parents. From the perspective of biometry, the question of estimation is also fundamental, because correct
estimation and adequate interpretation are attributes of this area. Nevertheless, biometry also aggregates statistical
information, such as concepts of fixed and random effects, and information regarding statistical and genetic designs and
breeding strategies, especially those referring to the definition of testing and recombination units. From this, a more
appropriate value of genotypic variation can be obtained, which, elucidated by quantitative genetics, will be interpreted
and made available for breeders to use in breeding programs. Thus, as an illustration, genotypic variance in the context
of quantitative genetics can be expressed through the following equation:
to seek better understanding regarding heterosis, presenting and discussing some hypotheses of dominance and/or
overdominance that would explain the appearance of heterosis. Furthermore, biometry takes the route of modeling
and, as in the studies of diallel analysis proposed by Gardner and Eberhart, presents us with concepts and estimators
of mean, varietal, and specific heterosis that are very useful in a program that aims to recommend hybrid combinations
for commercial growing.
Finally, we emphasize one of the most important items of information in breeding, which is heritability. Quantitative
genetics invites us to an understanding of this phenomenon from different equally important angles. We can understand
heritability as being the proportion of phenotypic variation (σ2F) that has a genetic nature and indicates the degree
of difficulty in obtaining gains from selection in accordance with the existence of genuinely genetic variability in the
population of interest.
σ²G
H2 = σ²F
(9)
Yet, heritability is also a measure of the accuracy of the selection process, indicating the degree of accord between
the phenotypic value manifested by the individual and its true genetic value. Thus, heritability can be quantified by the
square of the correlation between observable phenotypic values and true genotypic values, that is
H2 = r2FG (10)
where F = G + M, in which the phenotypic value (F) is given by the genotypic value (G) under the effect of the medium
(M). In expression 10, it is assumed that cov (F, G) = σ2G , since replication and experimental randomness and random
action of the environment ensure that the association between genotype and environment is null.
A third equally important concept refers to the fact that heritability measures how much the variations of the parents
are reflected in the variations passed on by descendants. In this context, a linear relation is established between the
genetic values predicted and manifested in the progeny (F) and the phenotypic values manifested in its parents (P), that is
F = β0 + βPFP + ε
Thus,
H2 = βPF (11)
Finally, a widely used concept is that which relates the genotypic performance of the individuals of an improved
population (Y) resulting from the recombination of superior individuals, identified in comparative tests between
individuals of an original population under selection, manifesting phenotypic values (X). In this case, the following
predictive model is used:
(Y – Y�) = β̂(X – X�)
where (Y – Y�) is the gain from selection in the selected progeny and (X – X�) is the differential of selection practiced in
the population under breeding. Also,
H2 = β̂ = βUt,Um (12)
where βUt,Um is the regression coefficient established by the relation between the phenotypic values of the test unit (Ut)
and the genotypic values of the improved unit (Um).
We see that the definitions presented in expressions 9 to 12 are genetically based and are closely related to the
questions of breeding. However, obtaining the value of this heritability requires a great deal of other information,
such as type of family, genetic and statistical design, replications, breeding strategies, and others. Thus, supposing the
evaluation of a set of g families in randomized block experiments with b blocks, in which n plants were evaluated per
plot, the biometric approaches that take into consideration all the genetic aspects, as well as the information of the
breeding strategy adopted and experimental particularities, would allow estimation of different heritability coefficients
from the same experiment. These coefficients are not only interpreted under the standards of quantitative genetics, but
also meet the different requirements of the breeder in decision making, such as how to adopt selection between and
within a family, or stratified selection, combined selection, or simply family selection, among others.
2021), and unlike the conventional stochastic modeling used up to then, they are based on the principles of machine
learning and computational intelligence (Silva 2014).
By incorporating the use of methodologies and the application of new paradigms to breeding, such as computational
intelligence and machine learning, biometry has provided new alternatives of analyses to assist in cultivar selection.
Artificial intelligence methods are rapidly becoming essential for data analysis, especially as a support for decision-
making processes (Carneiro et al. 2017).
THE BIOMETRIC AREA ADAPTED TO NEW PARADIGMS OF ANALYSIS AND DATA INTERPRETATION
Biometry applied to breeding is based on genetic principles and the purpose of meeting demands for interpretation
of biological phenomena and providing information that can guide strategies and optimize resources. Thus, processing,
data analysis, and interpretation of results is the activity inherent to the work of biostatisticians. The accumulation of
information and advance of new technologies, especially in the area of computation, has made the biometric area
attentive to new analysis techniques with diverse objectives, especially for analyses of prediction, classification, and
recognition of patterns (Resende 2002, Resende et al. 2014).
Biometry, like data science, has earned prominence worldwide. Both are interdisciplinary areas directed to the study
and analysis of data aiming at detection of patterns and/or obtaining information to assist in decision making. Biometry
differs by the type of data it uses, by the phenomena emphasized, and, essentially, by being practiced in total observance
of the principles of genetics and the purposes of breeding. In 2019, data scientists appeared in first place in the ranking
made by the American recruiting site Glassdoor, which lists the best jobs in the United States. Biostatisticians are part
of this select group of researchers.
Many statistical procedures summarize or allow interpretation of phenomena through measuring core trends,
generally represented by the mean (arithmetic, weighted arithmetic, geometric, and harmonic), median, and mode.
Their objective is to represent an entire set by a single value, and many hypotheses are associated with the existence
of equality of these means in different sources of variation. Another measurement often requested in statistical studies
refers to dispersion, represented by information regarding amplitude, variance, and standard deviation. Based on these
core measurements, we construct estimates of error of a study and analyze the dispersion of its data.
Statistical models attempt to explain a response variable, with the purpose of fitting or classification through
a set of variables or independent effects beyond an experimental error. Presuppositions regarding these errors
are necessary and indispensable for establishment of estimators that lead to values that will serve as a basis
for guiding strategies and adoption of procedures for optimization of resources. Biometric procedures based on
principles and statistical models are widely applicable in breeding. Nevertheless, other currents of thought have
been adopted, leading to solutions from different perspectives and giving researchers the opportunity to adopt
the solution of greatest interest.
Differentiated solutions for problems in the biometric area can be achieved within the area of artificial intelligence
(AI), or computational intelligence, which deals with automation of intelligent behavior and is divided into different
paradigms, notably the symbolic, connectionist, and evolutionary paradigms. The symbolic paradigm is based on symbolic
transformations (numbers, letters, words, and symbols) to establish a logical route until discovering a determined
solution. The most successful form of symbolic AI is specialist systems, which use a network of production rules. The
connectionist paradigm includes the procedures of neural networks and fuzzy logic inspired by the operation of the
human brain. The term was introduced by Donald Hebb in the 1940s. The essence of these procedures is the learning
algorithm that allows modification of the weights of connections and extraction of linear and non-linear information
from the problem under study; it is therefore of great interest to breeding. Finally, there is the evolutionary paradigm,
which is composed of a series of algorithms inspired by natural evolution, called genetic algorithms.
The neural network is an area of computational intelligence that meets the need for generating solutions related to
numerous problems, including those of classification and prediction, which are routine activities in breeding. In contrast
with the statistical approach, information is not summarized, but each example, or piece of information, is relevant in
a learning process in which each input (which corresponds to the independent variables in statistical vocabulary) has
weights, called adjustable synaptic weights. Among the types of neural networks important in solving classification
problems, the multilayer perceptron and radial basis function neural networks are prominent.
In neural networks with the purpose of fitting or classification, the variations in a response variable can be explained
by a set of inputs whose linear and non-linear actions are captured through abstract variables, with information
generated by neurons in hidden or intermediate layers with a variable number of both layers and neurons per layer.
Such potentialities allow analysis of complex traits in a more accurate manner and better understanding of important
phenomena such as dominance and epistasis.
Curently, biostatisticians make use of the resource of machine-learning approaches, which allow machines to
develop models and make predictions without the need for reprogramming. As the machines are exposed to new data,
they learn more and adapt in an independent manner. Machine-learning procedures can be classified as supervised or
non-supervised. In supervised learning, the algorithms learn which model is most suitable for predicting the variable
of interest, based on dependent variables (or inputs), through a set of examples that are submitted to the system. It
is said that this type of learning involves the participation of an “external agent” and is generally used in situations in
which the historic data foresee possible future occurrences. The decision tree procedure and its refinements and neural
networks are prominent in this type of learning. In unsupervised learning, the procedure generates response patterns
from measurements in variables of interest based on some structure and measure of similarity. The main procedures
are reduction of dimensionality (principal component analysis, multidimensional scaling), cluster analysis (K-means,
hierarchical methods), and Kohonen self-organizing maps.
CONCLUSION
Biometry is an area of genetics in continuous evolution, which has contributed to generating information and finding
solutions to diverse questions in genetics and breeding. The professional in this area must know the features of the basic
factors of heredity, population dynamics, mating systems, and strategies for conducting populations, among others. In
addition, (s)he must be attentive to all the tools of analysis and of data processing arising from the rapid and continuous
evolution of this area. It is an area that essentially depends on the critical perspective and attitudes of good breeders
that are data-generating agents and the great beneficiaries of the information generated, but their attentive eye is on
the methods of data analysis and on the algorithms that allow data processing supported by the biometric area.
A view of the current scenario leads to the conclusion that as many associated areas advance, an increasing volume
of data will become available, which will require careful analysis and interpretation. These data now also result from
the accumulation of information deposited in historical databases, and from aggregating and prospecting information of
diverse natures, involving data on genetic materials, climate, soil, etc. Also prominent in this scenario are globalization
of information and capturing of new data, especially acquisition and interpretation of image and spectral data.
There is an urgent need for further advances, now supported by biometric genetics, in the context of data processing
and in computational intelligence and machine-learning approaches. This is associated with emerging areas in breeding,
especially phenomics, and further developments in the question of handling big data.
quantitative traits with Bayesian neural networks: a case study with seleção genômica no melhoramento de plantas. Novas Edições
Jersey cows and wheat. BMC Genetics 12: 87. Acadêmicas, Mauritius, 128p.
Hallauer AR and Miranda Filho JB (1988) Quantitative genetics in maize Sant’anna IC, Nascimento M, Silva GN, Cruz CD, Azevedo CF, Gloria LS
breeding. Iowa State University Press, Ames, 468p. and Silva FF (2019b) Genome-enabled prediction of genetic values
for using radial basis function neural networks. Functional Plant
Meuwissen THE, Hayes BJ and Goddard ME (2001) Prediction of total
Breeding Journal 1: a1.
genetic value using genome wide dense marker maps. Genetics
157: 1819-1829. Sant’anna IC, Silva GN, Nascimento M and Cruz CD (2021) Subset selection
of markers for the genome-enabled prediction of genetic values using
Oliveira AC, Furtado DF and Ramalho MAP (2005) Experimentação em
radial basis function neural networks. Acta Scientiarum. Agronomy
genética e melhoramento de plantas. UFLA, Lavras, 300p.
43: e46307.
Paterniani E (1966) Genética e melhoramento do milho. In Krug CA (ed)
Silva GN and Cruz CD (2015) Redes neurais artificiais: Novo paradigma
Cultura e adubação do milho. Instituto Brasileiro de Potassa, São
para a predição de valores genéticos. Schaltungsdienst Lnag o.H.G,
Paulo, p. 109-148.
Berlin, 92p.
Ramalho MAP, Santos JB and Zimmermann MJO (1993) Genética
Silva GN, Tomaz RS, Sant’anna IC, Nascimento, M, Bhering LL and Cruz CD
quantitativa em plantas autógamas: aplicações ao melhoramento
(2014) Neural networks for predicting breeding values and genetic
genético do feijoeiro. UFG, Goiânia, 271p.
gains. Scientia Agricola 71: 494-498.
Resende MDV (2002) Genética biométrica e estatística no melhoramento
Souza Júnior CL (2001) Melhoramento de espécies alógamas. In Nass LL,
de plantas perenes. Embrapa, Brasília, 975p.
Valois ACC, Melo IS, Valadares Inglis MC (org) Recursos genéticos e
Resende MDV (2008) Genômica quantitativa e seleção no melhoramento melhoramento de plantas. Fundação MT, Rondonópolis, p. 159-199.
de plantas perenes e animais. Embrapa Florestas, Colombo, 330p.
Vencovsky R (1987) Herança quantitativa. In Paterniani E and Viégas
Resende MDV, Silva FF and Azevedo CF (2014) Estatística matemática, GP (ed) Melhoramento e produção do milho. Fundação Cargill,
biométrica e computacional. Editora UFV, Viçosa, 881p. Campinas, p. 137-214.
Sant’anna IC and Cruz CD (2019a) Redes neurais artificiais na predição Vencovsky R and Barriga P (1992) Genética biométrica no
de valores genéticos: aplicação de inteligência computacional e fitomelhoramento. Revista Brasileira de Genética, Ribeirão Preto,
486p.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is properly cited.