Bivariate-Data-Report-Writing
Bivariate-Data-Report-Writing
Key components of the statistical enquiry cycle for investigating bivariate measurement data:
• posing an appropriate relationship question using a given multivariate data set
• selecting and using appropriate displays
• identifying features in data
• finding an appropriate model
• describing the nature and strength of the relationship and relating this to the context
• using the model to make a prediction
• communicating findings in a conclusion.
Bivariate Data
Bivariate data compares two variables that are potentially connected e.g. ice cream sales and temperature
on that day
In this assessment you will be given some raw data that you will be required to analyse by drawing a scatter
plot (using iNZight) and writing a report.
Overview of Time Series Report (Use headings 1-6 to organise your report)
I notice / I wonder (do not include these notes in your final report) – use Scatter Plot Matrix in inZight
1. Introduction / Background
2. Identify features in the data (association, features, trend)
3. Select and justify an appropriate model (linear / non-linear)
4. Make a prediction in context with units and sensible rounding.
5. Examine other Models
6. Conclusion
and Red Blood Cell Count for athletes from the Australian Institute of Sport.
Relationship Question (one This report will investigate the nature of the relationship between Haematocrit
only). levels and Red Blood Cell Count for athletes from the AIS.
This report will investigate if an athlete’s Haematocrit levels can be used to
predict their Red Blood Cell Count.
Aim / Interest (Why worth An understanding of the relationship between these variables might be useful
investigating? Questions?) to… because…
Source The source of this data is 120 athletes from the Australian Institute of Sport.
Data / Survey
Haematocrit level
Name other variables that might may well impact on the number of such blood cells.
impact on the response variable Other factors that may affect a person’s red blood count are… because…
and suggest how they might
impact. e.g. gender age.
From the scatter plot it appears that there is a linear relationship
between Haematocrit levels and Red Blood Cell Count.
Trend
Linear / non-linear?
From the scatter plot it appears that there is a non-linear relationship
between x and y.
I understand… I need to work on…
linear trend
Haematocrit level
For the reasons given above a linear regression model has been
fitted to the data.
R Reason for linear / non-linear model
For the reasons given above a non-linear regression model has
been fitted to the data.
Description of model The linear model shows that Red Blood Cell Counts increase by
D
Gradient statement if linear 0.1 for each increase of 1 in Haematocrit value.
This model appears to be a good fit of the data throughout the
range of Haematocrit levels with all points aligning with the linear
trend.
Discussion of fit throughout the range of x The number of points above the trend line is also similar to the
values. number of points below.
Look at how well the points align with the However, there are no athletes with Haematocrit levels from 53
trend line for the range of x values. to 59 and so we are unable to describe the fit for this data range.
This means the model may not be as appropriate for assessing the
relationship between these variables when the Haematocrit levels
are over 52.
Consider number of data points This is a relatively high number of data points (120) which
enhances the reliability of the model.
The relatively low number of data points means this model may
not be particularly reliable.
Appro This relationship is only statistical and does not imply that an
Correlation / causation increase in Haematocrit level causes an increase in Red Blood Cell
Count.
This relationship appears to be moderate-to-strong as there is
Look at scatter some scatter along the trend line but it is not a large amount.
Strength of relationship
Strong / moderate-to-strong / moderate / The correlation coefficient is also relatively high at 0.93 indicating
weak-to-moderate / weak there is evidence of a fairly strong linear relationship between
Look at amount of scatter about the Haematocrit values and Red Blood Cell Counts.
regression line The scatter along the trend line is non-constant, with more scatter
If linear then r / correlation coefficient after x. This suggests a stronger relationship for x = and a
variation in scatter – constant / non- potentially weaker relationship for x =.
constant / fanning out There is an increase in scatter after x = . This suggests the
relationship may not be as strong after this point.
One unusual value is present with a Haematocrit level of 60 and
a Red Blood Cell Count over 6.5.
Unusual
Visual description, numerical description, Two groups are suggested in the scatter plot – the first with x < …
discuss possible reasons for differences. and the second with x > … One possible reason for these
differences may be…
line. From this model I predict that the red blood count of a person with a
Round answer sensibly and include units Haematocrit level of 50 will be 5.5. (¿ 0.11565∗50−0.26 ¿
if appropriate.
Don’t relate to observed y-values.
Given the moderate-to-strong relationship found in the data it is
Justification regarding how accurate
likely that this prediction will be quite accurate.
prediction might be – reference to stat
This prediction is likely to only be accurate for athletes as it is likely
evidence from analysis.
Justification
that they will have higher general Haematocrit and red blood cell
Reflect on prediction by discussing their
levels that the rest of the population as they exercise more 1
relevance to wider population.
Haematocrit is likely to be the best explanatory variable because…
Justify choice of variables to use by giving
Given the weak relationship found in the data this prediction is
reasons for using the selected one rather
unlikely to be particularly accurate and should only be taken as a
than others.
rough indication of y at point x.
I understand… I need to work on…
1
http://www.livestrong.com/article/299082-the-effect-of-athletic-training-on-the-rbc-count/
The data point at Haematocrit value 59 does not appear
to fit in with the rest of the data. This could be a valid
If unusual values:
Comment on the effect the difference subsets might One factor that may influence the relationship between
have on the model. Haematocrit levels and Red Blood count is the gender of
If subsets /
groups: