What Do You Mean by The Additive Property of The T
What Do You Mean by The Additive Property of The T
Example
The analysis of variance can be used as an exploratory tool to explain observations. A
dog show provides an example. A dog show is not a random sampling of the breed: it
is typically limited to dogs that are adult, pure-bred, and exemplary. A histogram of
dog weights from a show might plausibly be rather complex, like the yellow-orange
distribution shown in the illustrations. Suppose we wanted to predict the weight of a
dog based on a certain set of characteristics of each dog. One way to do that is to
explain the distribution of weights by dividing the dog population into groups based
on those characteristics. A successful grouping will split dogs such that (a) each
group has a low variance of dog weights (meaning the group is relatively
homogeneous) and (b) the mean of each group is distinct (if two groups have the
same mean, then it isn't reasonable to conclude that the groups are, in fact, separate
in any meaningful way). In the illustrations to the right, groups are identified as X1,
X2, etc. In the first illustration, the dogs are divided according to the product
(interaction) of two binary groupings: young vs old, and short-haired vs long-haired
(e.g., group 1 is young, short-haired dogs, group 2 is young, long-haired dogs, etc.).
Since the distributions of dog weight within each of the groups (shown in blue) has a
relatively large variance, and since the means are very similar across groups,
grouping dogs by these characteristics does not produce an effective way to explain
the variation in dog weights: knowing which group a dog is in doesn't allow us to
predict its weight much better than simply knowing the dog is in a dog show. Thus,
this grouping fails to explain the variation in the overall distribution (yellow-orange).
An attempt to explain the weight distribution by grouping dogs as pet vs working
breed and less athletic vs more athletic would probably be somewhat more
successful (fair fit). The heaviest show dogs are likely to be big, strong, working
breeds, while breeds kept as pets tend to be smaller and thus lighter. As shown by
the second illustration, the distributions have variances that are considerably smaller
than in the first case, and the means are more distinguishable. However, the
significant overlap of distributions, for example, means that we cannot distinguish X1
and X2 reliably. Grouping dogs according to a coin flip might produce distributions
that look similar. An attempt to explain weight by breed is likely to produce a very
good fit. All Chihuahuas are light and all St Bernards are heavy. The difference in
weights between Setters and Pointers does not justify separate breeds. The analysis
of variance provides the formal tools to justify these intuitive judgments. A common
use of the method is the analysis of experimental data or the development of
models. The method has some advantages over correlation: not all of the data must
be numeric and one result of the method is a judgment in the confidence in an
explanatory relationship.