Louis of Maths AA SL IA Form B
Louis of Maths AA SL IA Form B
Form B
Exploration—Initial planning
1 Title:
You should have a title as shown here, and not just a research question. Note, in many cases involving social sciences and human geography it is not
easy to establish effect. But for running as in this case, we can see how certain training can affect speed.
5 A detailed plan for the task (Please include a timeline for the task, what data is to be collected and
how, what do you need to learn and how, the use of technology, and a list of relevant resources)
First week: read up on research on improving running speed, multiple linear regression & hypothesis testing.
Recruit as many students as possible to help me by using systematic sampling technique.
Prepare a survey and have it approved by the teacher: The survey asks participants for
a) how often do they run in the last 6 months (Xfreq)
b) how far they run each week in the last 6 months (Xdist)
c) the average time in running 100m (Xmeantime)
d) gender (Xgender)
e) how often do they do speed training in the last 6 month (Xspdtrain)
These are the independent variables – factors that determine speed before the training. In this case, I have 5 factors
that may affect running speed. On top of that, I also have three training methods which gives depth to my data. You should avoid having only 2
variables (one dependent and one independent) in your IAs.
Second week: (a) use random sampling method to distribute volunteers to group 1: elliptical only, 2: speed on
track only, and 3:elliptical and speed training on track
(b) collect survey responses from all volunteers
You must should evidence of using random sampling in your IA. This can be done using the EXCEL random number generators. Mentioning that random
sampling has been done does not qualify as evidence. You should also provide discussion why you select certain random sampling method before
carrying out the random sampling.
(c) Third week: time the 50 m run of each volunteer at least 3 times with a 5-minute rest between each run.
(d) Third week onwards ensure all volunteers follow the speed training programme: 10 minutes on elliptical at
level 10 with 65 revolutions for the first 3 minutes and maintain 75 at above for 17 minutes. Do 2 sets of 10
minutes with 2 minutes rest. Run 100m at speed close to max speed with a 2-minute rest between run. Do 2
sets of 4 times 100m. The third group do a day of elliptical and a day of speed training (Mon, Fri for instance).
You need to have a clear procedure for collecting data and know what data to be collected. If your data come from certain sources, then please identify
these sources and urls. Having said so, you may show more personal engagement (Criterion C) by collecting first hand data as shown in this example.
Repeat
(e) Repeat step (c) at the end of each week. End of 5th week.
Analysis I: Just use t-test to check the before and after time after undergoing the training to see whether or not
any of the training method improves time.
Discuss the weakness of this method which ignores the fitness and base speed of a runner. You may wonder
whether these factors also affect the effectiveness of a training method. This discussion is credited under
Criterion D for reflection.
The fact that you move on from analysis I to another analysis to further explore the data will give you credit fro
Criterion C for personal engagement.
Analysis II:
If they are normally distributed then we can check paired-sample t-test for before and after time – each week,
and for the end of the month to see whether a particular mode of training improves running speed.
Display the probability distribution using a histogram and see whether they are normally distributed.
Always provide an observation or conclusion of what you had learned from the method or presentation. In this case, discuss about the shape of
distribution (skewed or close to being symmetrical) and identify any extreme values. Discuss the spread of the data from this visual observation.
Verify the visual observation of mean and spread with calculation of mean and calculation of standard
deviation. Compare these numerical results with the visual observation above.
If the distribution above looks somewhat normally distributed, then discuss that how you wish to use Chi-square goodness of fit to test whether the data
are indeed normally distributed with the mean and variance calculated above.
Use GOF to test whether the data are normally distributed. Draw appropriate inferences from the test.
The correlation between Y and Y is obviously 1. The correlation between Y and Xfreq is 0.24, and so on. In this example, Xdist and Xmeantime are highly correlated so
we can leave out one of this variables for being redundant. At the same time, Xdist and Xmeantime are highly correlated with Y 2, in this case, both Xdist and Xmeatime
cannot be used as independent variable (because the correlation is very high, thus not independent). In this case, the model should only have Xfreq, Xgender and
Xspdtrain as the independent variables for Y..
HL: Test the population correlation coefficient. If the correlation coefficient is significant then I continue with
linear/multiple regression.
Those who are least correlated can be collected to run a multiple linear regression with the end of the month
time as the dependent variable.
The model is now Y2 = a + b Xfreq + c Xgender + d Xspdtrain + e (Xm) + f (Xn)
Xm and Xn are dummy variables. Xm=0 and Xn=0 means the person uses training method 1. Xm=1 and Xn=0 means the person uses training method 2. Xm=0 and
Xn=1 means the person uses training method 3.
Thus, Y2 = a + b Xfreq + c Xgender + d Xspdtrain is the base model for people using method 1.
Y2 = a + b Xfreq + c Xgender + d Xspdtrain + e is the model for people using method 2.
Y2 = a + b Xfreq + c Xgender + d Xspdtrain + f is the model for people using method 3.
Use p-value to check the statistical significance with t-test (note this should only be done if the underlying
distribution has been confirmed to be normal) of each coefficient in the model (e.g. the coefficients a, b, c and
d, e and f in the model above), especially those related to training (e and f).
Use Sum of Square residues to assess the goodness of fit of various models (by removing the least influential
independent variables).
HL: Do this manually with matrices before using technology to confirm model.
Once you have a final model, always describe the magnitude of the coefficients (a, b, c ,d, e, f in the model)
above, whether or not they are significant based on the t-test above (this is a topic outside of your syllabus, so
you need to provide in-text citation for this analysis), the implication of these coefficients on the running speed,
and any prediction based on the model. You need to discuss the reasonableness of the results from this model
compared to reality. What is the implication of this model to you research question and what can you personally
learn from it? What are the strengths and weaknesses of your mathematical analysis and how these affect the
validity of your results? Many students do not do this discussion and score poorly in Criterion D.
6 What Mathematical Communication will be used? (e.g. what key terms to be defined, what
presentation (graphical, tables, etc.) to be used? ) See above highlighted in yellow.
7 How will you demonstrate Personal Engagement in this task? (e.g. Do you collect your own
data? Do you run your own simulation or experiment? Do you build different types of models? Do you
learn new maths or skills? Does your personal voice come through in your document?)
See the plan above on random sampling methods and data collection.
Will use pair-wise correlations and p-values to create different models.
Will learn multiple linear regression with matrices (not in the syllabus).
Will learn to use dummy variable in linear regression (not in the syllabus).
8 Mathematical processes likely to be used in the analysis. (This is linked to question 5 above.)
See plan in question 5.
9. Resources in MLA8 format?
I suggest that you use these resources to learn about multiple linear regression and for HL students on how to use matrix algebra for multiple regression.
Bevans, Rebecca. “An introduction to multiple linear regression.” Statistics. 26 Oct 2020,
https://www.scribbr.com/statistics/multiple-linear-regression/. Accessed 12 Aug 2021.
Doering, Suzanne, et al. Mathematics: Applications and interpretation. Higher Level. Oxford, 2019.
Pennsylvania State University, The. “ 5.4-A Matrix Formulation of the Multiple Regression Model.” Stat 462:
Applied Regression Analysis. 2018, https://online.stat.psu.edu/stat462/node/132/ Accessed 12 Aug 2021.
Wazir, Ibrahim, et al. Higher level. Mathematics: Applications and interpretation. Pearson, 2019.
I have received feedback from my teacher and I declare that I am fully responsible on whether I abide by them or
not. Once the topic is approved, I should not change it. If changed, I am not expecting the teacher to give me any
feedback.
Date: ______________________________________________