Using National Assessment Program - Literacy and Numeracy (NAPLAN) Data in The Longitudinal Study of Australian Children (LSAC)
Using National Assessment Program - Literacy and Numeracy (NAPLAN) Data in The Longitudinal Study of Australian Children (LSAC)
uk
Provided by Analysis and Policy Observatory (APO)
Growing Up in Australia:
The Longitudinal Study of Australian Children (LSAC)
LSAC Technical Paper No. 8
April 2013
Using NAPLAN data in the Longitudinal Study of Australian Children
Acknowledgements
We would like to thank the state and territory education departments who supported the data
linkage process, because without their ongoing support NAPLAN could not be linked to the LSAC
data. For their useful comments, thanks also go to Dr Daryl Higgins and Professor Alan Hayes at
the Australian Institute of Family Studies; the Department of Families, Housing, Community Services
and Indigenous Affairs; the Australian Bureau of Statistics; and reviewers of this paper from the
LSAC Data Expert Reference Group.
Growing Up in Australia: The Longitudinal Study of Australian Children, is conducted in partnership
between the Australian Government Department of Families, Housing, Community Services and
Indigenous Affairs, the Australian Institute of Family Studies and the Australian Bureau of Statistics,
with advice provided by a consortium of leading researchers.
Contents
Acknowledgements ii
1. Introduction 1
1.1 The Longitudinal Study of Australian Children (LSAC) 1
1.2 National Assessment Program—Literacy and Numeracy (NAPLAN) 1
2. Linkage 3
2.1 Obtaining consent 3
2.2 Linkage and matching process 3
2.3 Modelling non-NAPLAN data cases 5
3. LSAC NAPLAN data file 9
3.1 Data storage 9
3.2 Key variables 9
4. Using NAPLAN data in LSAC 12
4.1 Data compendium 12
4.2 Birth cohort vs year level cohort 12
4.3 Representativeness of year level cohort in LSAC 13
4.4 Timing of NAPLAN testing in LSAC 14
4.5 Correspondence between NAPLAN and LSAC data 15
5. Comparative analysis 19
5.1 LSAC NAPLAN scores vs national NAPLAN scores 19
5.2 Association between NAPLAN scores and LSAC learning measures 34
6. Conclusion 37
References 39
Appendixes 40
List of tables
Table 1: NAPLAN consent forms, K cohort 3
Table 2: Reasons for NAPLAN non-consents, K cohort 3
Table 3: Data matching rates, K cohort 4
Table 4: Estimation results of logistic regression 6
Table 5: Streams for longitudinal analysis, LSAC NAPLAN Wave 3 data release 10
Table 6: NAPLAN data provided, by year level, LSAC NAPLAN, K cohort 12
Table 7: NAPLAN data provided, by year level and calendar year, LSAC NAPLAN, K cohort 12
Table 8: Year 5 NAPLAN scores, Australia, 2009–11 14
Table 9: Mean, minimum and maximum time gap between Year 5 NAPLAN data (2009–11)
and Wave 3 LSAC data (2008), by calendar year 16
Table 10: Mean, minimum and maximum time gap between NAPLAN data (2009–11) and
Wave 3 LSAC data (2008), by year level 17
Table 11: Mean, minimum and maximum time gap between Year 3 NAPLAN data (2008–09)
and Wave 4 LSAC data, by calendar year 17
Table 12: Mean, minimum and maximum time gap between NAPLAN data (2008–09) and
Wave 4 LSAC data, by year level 18
Table 13: Year 5 NAPLAN data (2009–10) and Wave 4 LSAC data (2010) 18
Table 14: Year 3 and Year 5 NAPLAN data (2009–10) and Wave 4 LSAC data (2010) 18
Table 15: Correspondence between NAPLAN bands and scaled scores 32
Table 16: National minimum standards for Reading, by year level and test year 32
Table 17: National minimum standards for Writing, by year level and test year 33
Table 18: National minimum standards for Spelling, by year level and test year 33
Table 19: National minimum standards for Numeracy, by year level and test year 33
Table 20: National minimum standards for Grammar and Punctuation, by year level and test year 33
Table 21: Correlation coefficients for NAPLAN scores and LSAC learning and cognitive
measures, Year 3, 2008–09 36
Table 22: Correlation coefficients for NAPLAN scores and LSAC learning and cognitive
measures, Year 5, 2009–11 36
List of figures
Figure 1: Predicted probability of NAPLAN data not to be linked, by PPVT score 7
Figure 2: Age distribution of LSAC children who sat the NAPLAN test in Year 5, 2009–10 13
Figure 3: Distribution of Wave 4 interview dates 15
Figure 4: Step-by-step process of choosing the correct NAPLAN data and LSAC data 15
Figure 5: NAPLAN tests by period 16
Figure 6: LSAC NAPLAN and national NAPLAN scores, by tests and year levels 20
Figure 7: LSAC NAPLAN and national NAPLAN Reading scores, by gender and year level 21
Figure 8: LSAC NAPLAN and national NAPLAN Writing scores, by gender and year level 22
Figure 9: LSAC NAPLAN and national NAPLAN Spelling scores, by gender and year level 22
Figure 10: LSAC NAPLAN and national NAPLAN Numeracy scores, by gender and year level 23
Figure 11: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by gender
and year level 23
Figure 12: LSAC NAPLAN and national NAPLAN Reading scores, by LBOTE and year level 24
Figure 13: LSAC NAPLAN and national NAPLAN Writing scores, by LBOTE and year level 24
Figure 14: LSAC NAPLAN and national NAPLAN Spelling scores, by LBOTE and year level 25
Figure 15: LSAC NAPLAN and national NAPLAN Numeracy scores, by LBOTE and year level 25
Figure 16: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by LBOTE and
year level 26
Figure 17: LSAC NAPLAN and national NAPLAN Reading scores, by parental education and year level 27
Figure 18: LSAC NAPLAN and national NAPLAN Writing scores, by parental education and year level 27
Figure 19: LSAC NAPLAN and national NAPLAN Spelling scores, by parental education and year level 28
Figure 20: LSAC NAPLAN and national NAPLAN Numeracy scores, by parental education and year level 28
Figure 21: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by parental
education and year level 29
Figure 22: LSAC NAPLAN and national NAPLAN Reading scores, by parental occupation and year level 29
Figure 23: LSAC NAPLAN and national NAPLAN Writing scores, by parental occupation and year level 30
Figure 24: LSAC NAPLAN and national NAPLAN Spelling scores, by parental occupation and year level 30
Figure 25: LSAC NAPLAN and national NAPLAN Numeracy scores, by parental occupation and
year level 31
Figure 26: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by parental
occupation and year level 31
1. Introduction
1.1 The Longitudinal Study of Australian Children (LSAC)
Growing Up in Australia: The Longitudinal Study of Australian Children (LSAC) is a national study
designed to provide an in-depth understanding of children’s development in Australia’s current
social, economic and cultural environment, thereby contributing to the evidence base for future
policy and practice development.
The study is conducted in partnership between the Australian Government Department of Families,
Housing, Community Services and Indigenous Affairs (FaHCSIA), the Australian Institute of Family
Studies (AIFS) and the Australian Bureau of Statistics (ABS), with advice provided by a consortium
of leading researchers from research institutions and universities throughout Australia.
The study commenced in 2004 with the recruitment of two cohorts: one cohort of 5,107 children
aged 0–1 year old (the birth or “B cohort”) and another of 4,983 children aged 4–5 years old (the
kindergarten or “K cohort”) and their families across all states and territories of Australia. Interviews
comprising different instruments are conducted with families every two years.
data are used only as an example. The same approach and analysis may also be directly applied
to B cohort data; however, given that at the time of writing only 20% of B cohort children had sat
NAPLAN tests and nearly all K cohort children had sat at least one NAPLAN test, the latter cohort
of children was chosen for ease of explanation.
Section 2 presents an overview of how consent was obtained, and the matching and linkage
processes. It then examines to what degree the sample of children with linked NAPLAN data is
representative of the LSAC Wave 1 sample. Section 3 describes how NAPLAN data are stored in
the LSAC data file. Section 4 discusses the correspondence between year level cohort and birth
cohort and how to use NAPLAN data in LSAC. Section 5 examines the representativeness of the
NAPLAN results in LSAC at the national level and across different socio-demographic groups. The
fifth section also explores the extent to which NAPLAN data are correlated with the main cognitive
and learning measures used in LSAC. A discussion concludes the paper.
2. Linkage
2.1 Obtaining consent
At the LSAC Wave 3 data collection, parents of the K cohort children were asked to fill in a consent
form for allowing access to their study child’s NAPLAN data (see Appendix A). Parents who did
not provide consent at Wave 3 for any reason or who did not participate at Wave 3 were asked
again at Wave 4 using an updated consent form (see Appendix B).1 If a family did not participate at
either of these waves, parents did not have an opportunity to provide or refuse the consent to link
the NAPLAN; therefore, these families were considered not available. For consent to be obtained,
one of the parents or guardians had to tick all relevant boxes in the form and sign the form in the
presence of a witness. If at least one box or one signature was missing, the form was incomplete
(also referred to here as being filled in incorrectly) and, in these cases, it was considered that
consent was not given. Consent was also not obtained if parents refused to sign the form or the
ABS office did not receive the consent form from parents. Table 1 reports the consent rate for the
total sample (Wave 1 sample) and the available sample (participants of Waves 3 or 4).
Notes: a Available sample refers to families who participated in Waves 3 or 4. b Total sample refers to Wave 1 sample.
Source: LSAC, K cohort
It can be seen from Table 1 that 95% of interviewed families provided their consent to NAPLAN
data linkage and only 5% (204) did not actively consent. Of these 204 families, 48 families refused
to provide consent, 117 did not tick all of the boxes or one or both of the signatures were missing,
and 39 consent forms were not received by the office (see Table 2).
Out of the total sample (Wave 1), consent was not obtained from 15% of families, either because
the family did not give consent due to the reasons specified in Table 2 or a family was not asked
due to non-participation at Waves 3 or 4.
1
The consent form was simplified, as most of the non-consents in Wave 3 (77% of non-consents) were due to the
form being filled in incorrectly.
each state/territory government had to agree to match the data. Data matching was only done for
children where consent to link the NAPLAN data to the LSAC sample was obtained.2
The procedure undertaken to link the LSAC and NAPLAN data was as follows:
1. The ABS sent each state/territory government a list of participants who had agreed to the
linkage, with identifying variables—including school and child variables (see below)—and a
dummy LSAC ID identifier. The LSAC ID was different from the HICID, which is the unique ID
for a study child within LSAC.
2. Each state/territory government matched the LSAC child data on the list of variables provided
with the NAPLAN data. They then sent AIFS a list that contained the scaled scores for each
NAPLAN test against the LSAC ID identifier, without the school or child’s data. The ABS was
not sent LSAC NAPLAN data, so it did not have the ability to match the data back to names and
addresses through the LSAC ID.
3. In order to link NAPLAN scaled scores to the LSAC data, AIFS used an ABS-generated
concordance between the LSAC ID identifier and HICID.
This procedure ensured that each jurisdiction did not know the HICID and, therefore, could not
match records in the AIFS output datasets, and at the same time AIFS did not know the school
names, child information or postcodes.
The match between NAPLAN student results and LSAC children was based on the following
variables:
■■ child’s first name;
■■ child’s surname;
■■ child’s date of birth;
■■ school name; and
■■ school postcode.
Table 3 reports the overall matching rate results using 2008–11 NAPLAN results. For 2% of K cohort
children, the NAPLAN data were not matched for Year 3, Year 5 and Year 7. For the remaining 98%
of children, the NAPLAN data were matched for at least one year level (i.e., Year 3, Year 5 and/
or Year 7). It is worth noting that a match rate within a particular year level might be lower than
98% because some children’s NAPLAN data could be matched for one year level but not another
(see section 2.4 for details). As NAPLAN data were linked to only 4,159 cases, we will refer to this
sample as the LSAC NAPLAN sample from now on.
Notes: Eligible sample refers to families who gave consent to link NAPLAN data. b Total sample refers to Wave 1 sample. c Cases
a
were not used in matching if consent was not obtained or families were not asked for their consent.
Source: LSAC, K cohort
While the matching rate for the eligible sample is very high (98%), the matching rate for the total
sample (Wave 1 sample) is only 84% (4,159 out of 4,983). It is important to assess whether there are
systematic differences in parental socio-demographic characteristics and the child’s learning abilities
between children with and without the linked NAPLAN data. Any observed significant differences
2
While the original intention was to match records using an exact matching procedure, there were some problem
cases that did not completely match the jurisdictions’ databases. The most common reason for this was a slight
discrepancy in names (for example, “Jenny” vs “Jennifer”, or a hyphenated vs unhyphenated name). These cases
were sorted through manually to match them to the NAPLAN database.
should be taken into account when interpreting LSAC NAPLAN scores or comparing them against
NAPLAN national statistics. The following section presents the statistical analysis of this matter.
yi =
{ 1, NAPLAN data absent
0, NAPLAN data present
The explanatory variables include parental characteristics and children’s characteristics. The
explanatory variables are derived from LSAC Wave 1 data, as only at Wave 1 information is available
for all respondents.
Summary statistics for the independent variables are presented in Appendix E.
Parental characteristics
Parental characteristics include educational attainment, mother’s working hours, family composition
and language background.
The level of educational attainment is measured as the highest educational qualification completed
by either of the parents. In the logistic regression, level of educational attainment is treated as a
set of dummy variables (bachelor degree or above; advanced diploma/diploma; certificate I–IV;
Year 12 or equivalent; Year 11 or equivalent or below), where having a bachelor degree or above
is considered as the reference category against which every other dummy variable is compared.
Mother’s working hours is included as a set of dummy variables (employed 37 hours or more,
employed fewer than 37 hours and employed zero hours), where being employed for 37 hours or
more is considered as the reference category.3 Parental occupation is not included as it is highly
correlated with parental education and labour force status.
Family composition is categorised as a single-parent family if the study child has just one parent in
the household in which he/she lives at the time of the study.
We also control for language background by whether the family is from a language background
other than English (LBOTE). A family is classified as LBOTE if the study child or either of the
parents speaks a language other than English at home.
3
“Employed” includes employed full-time, employed part-time, and employed but on maternity leave.
Children’s characteristics
Children’s characteristics include the study child’s gender and measures of cognitive and non-
cognitive abilities, readiness for school, and levels of emotional and behavioural problems.
We measure differences in children’s learning development, and cognitive and non-cognitive abilities
using the Peabody Picture Vocabulary Test (PPVT-III 1997), Who Am I (WAI) and Strength and
Difficulties Questionnaire (SDQ). The PPVT is used to measure receptive language and vocabulary,
and knowledge of the meaning of spoken words. The WAI test is used to measure children’s ability
to perform pre-literacy/pre-numeracy tasks, such as reading, copying and writing letters, words,
shapes and numbers. The SDQ assesses peer problems, conduct problems, hyperactivity, emotional
problems and prosocial behaviours for children aged 3–12 years. All measures are standardised
direct tests administered by interviewers.
Estimation results
The results of the logistic regression are reported in Table 4 in the form of odds ratios (ORs) and
model fit indices. The odds ratio is a relative measure of risk, which indicates how much more
likely it is that someone who has a particular characteristic will not have NAPLAN data linked,
compared to someone who does not have this characteristics. An OR of greater than 1 suggests that
the NAPLAN data is more likely to be absent for those with a particular characteristic compared to
those without this characteristic. An OR of less than 1 suggests the NAPLAN data is less likely to be
absent for those with a particular characteristic compared to those who without the characteristic.
An odds ratio of 1 suggests that there is no difference in whether NAPLAN data is absent between
two groups with or without the characteristic.
To assess the model fit, a logistic model is compared against the intercept-only model (also
called the null model, as it has no predictors). Consequently, according to this model, every
observation would have the same probability of occurring. An improvement over this baseline
model is examined by using the likelihood ratio. It can be seen that according to the likelihood
ratio there is a significant improvement over the intercept-model. To assess the validity of the
model, we assess the fit of a logistic model against actual outcomes using the Hosmer–Lemeshow
(H–L) inferential goodness-of-fit test. The H-L test yielded a χ2(8) of 6.0 and was not significant (p
< .05), suggesting that the model fit the data well. In other words, the null hypothesis of a good
model fit to data was tenable.
For ease of interpretation, we focus on the ORs. It can be seen from Table 8 that there is no
statistically significant relationship between NAPLAN data being linked and family type, child’s
readiness to school (WAI) or child’s level of emotional and behavioural problems.
The highest parental education appears to be a significant predictor of having NAPLAN data.
Families with a certificate as the highest educational attainment were 1.34 times more likely not to
have NAPLAN data linked compared with families who have a university degree, holding all other
variables constant. There were no differences between families with other educational attainments.
Mother’s working hours were also estimated to be a significant predictor of NAPLAN data not being
linked. Children with mother’s who were not working any hours at Wave 1 were 1.38 times more
likely than mother’s who were working 35 hours or more at Wave 1 not to have NAPLAN data
linked, holding all other variables constant. There were no differences for children whose mother’s
were working less than 35 hours at Wave 1.
There is also a statistically significant relationship between LBOTE and whether NAPLAN data were
linked. For children from non–English speaking backgrounds, the odds of not having NAPLAN
data linked were 2.05 times greater than for children from English-speaking background families,
holding all other variables constant.
While there were no statistically significant relationships between NAPLAN data being linked and
most of the children’s characteristics, children’s level of receptive language and vocabulary was
significantly associated with not having NAPLAN data linked. For one standard deviation increase
in the PPVT score, the odds of not having NAPLAN data linked are 0.8 times smaller, holding all
other variables constant. Figure 1 shows the predicted probabilities for not having NAPLAN data
linked by the PPVT score. As the PPVT score increases, the predicted probability of NAPLAN data
not being linked decreases. Even though the confidence interval is quite wide when PPVT scores
are small, large differences in the children’s PPVT scores lead to non-trivial differences in the
probabilities of not having the NAPLAN data to be linked.
0.8
0.6
Probability
0.4
0.2
0
10 20 30 40 50 60 70 80 90
PPVT score
Predicted probability 95% confidence interval
Therefore, based on the logistic regression, it has been found that those children who have lower
PPVT scores, are from non–English speaking background, have parents with a certificate as the
highest educational attainment at Wave 1, and have mothers who did not work when the child
was 4–5 years old are less likely to have their NAPLAN data linked.
4
It is worth mentioning that after Wave 4 some categories may become redundant; for example, category 3
may become irrelevant as the updated consent form has been simplified and the chance of filling the form
in incorrectly will be very small. If a category becomes irrelevant at consequent waves, it will be removed
and the list of categories will be updated. Ideally, it is expected that there will be a dummy variable with “1”
corresponding to “consent obtained” and “0” corresponding to “consent refused”.
Table 5: Streams for longitudinal analysis, LSAC NAPLAN Wave 3 data release
Year 3 Year 5 Year 7 Year 9
Cohort Stream
Calendar year of NAPLAN test
K cohort 1 – 2009 2011 2013
2 2008 2010 2012 2014
3 2009 2011 2013 2015
B cohort 4 2011 2013 2015 2017
5 2012 2014 2016 2018
6 2013 2015 2017 2019
Both cohorts –9 Stream not applicable
The next ten dummy variables (rprey to ry9) correspond to all year levels, starting from pre-Year
1 and ending with Year 9, where 1 represents “year repeated”.
The variable repeated refers to whether a study child repeated a year level at least once prior to
NAPLAN (category 1 = pre-Year 1, Year 1 or Year 2), during NAPLAN testing (category 2 = Year 3,
Year 4, Year 5, Year 6, Year 7, Year 8 or Year 9), or repeated in both periods (category 3). These
categories are mutually exclusive—if category 1 is chosen, it means that a child repeated a year
level (at least once) only prior to the NAPLAN tests commencing but has not repeated any year
levels since Year 3, while category 3 suggests that a child repeated at least two year levels: one
in the period prior to NAPLAN and one during NAPLAN. The variable can be of great use if an
analyst is interested in selecting out all children who repeated a year level during NAPLAN, or in
controlling for children who repeated a year level prior to NAPLAN.
The next eight variables, where # refers to a year level (Year 3, 5, 7 or 9), are recorded for each
year level with respect to their corresponding NAPLAN tests.
The variables y#read, y#write, y#spel, y#gram and y#num refer to Reading, Writing, Spelling,
Grammar and Punctuation, and Numeracy scaled scores, respectively. Scores are reported up
to one decimal point unless state or territory authorities provided scores rounded to the whole
number. Tasmania and Western Australia provided rounded NAPLAN scores for 2008 and 2009
tests. Scores range from 0 to 1,000. If a student was absent or was exempt from a test, his/her
score is recorded as not applicable (–9). Students can be absent or exempt from one or all tests;
for example, “students with a language background other than English, who arrived from overseas
less than a year ago, and students with significant intellectual disabilities may be exempted from
testing” (MCEECDYA 2009, p. 3). It is possible for parents to give their consent to access NAPLAN
data, but that state/territory authorities are not able to identify (match) the study child in the
national NAPLAN database; therefore, for these children scores are missing. Scores for children
with no consent are also recorded as missing. If a study child repeats a year level and sits NAPLAN
tests for a second time, the most recent NAPLAN scores are stored in the LSAC NAPLAN data file.
The variable y#age refers to the age of the child at the time of testing.
The next two variables in the LSAC NAPLAN data file are y#test and y#state. The former variable
refers to the calendar year in which the test was undertaken by the study child. The latter refers to
the state/territory of the school attended by the study child. There are instances where a study child
resides in one state/territory but attends school in a different state/territory, which is explained by
some children living on a state/territory border or moving their place of residence between data
collection waves.
The variable y#status has five categories. Category 1 refers to cases where a study child completes
all tests; category 2 refers to cases where a study child is absent for some tests but completes
at least one test; category 3 refers to cases where a study child is absent for all tests; category 4
refers to cases where a study child is exempt from all tests; and category 5 refers to cases where
consent from parents has been obtained but state/territory authorities are unable to identify the
study child within the national NAPLAN database. It is worth noting that if study children are not
matched/identified to the NAPLAN data they are assigned to “no match” across all NAPLAN year
levels unless a match for any year level is found. If consent is not obtained, cases are considered
“not applicable” (–9).
The above compendium is presented for NAPLAN testing in years from 2008 to 2011 only. Every
two years, starting from 2013, the LSAC NAPLAN file will be updated by AIFS with new NAPLAN
results and released along with the main wave release. The report below focuses only on Year 3
and Year 5 LSAC NAPLAN data, as data for these year levels are complete (i.e., all children from
the eligible LSAC NAPLAN sample have already sat Year 3 and Year 5 NAPLAN tests).
Table 7: NAPLAN data provided, by year level and calendar year, LSAC NAPLAN, K cohort
Year 3 Year 5 Year 7
Age Age Age
N % N % N %
(years) (years) (years)
2008 2,891 69.5 8.6 – – – –
2009 203 4.9 9.3 936 22.5 9.9 – –
2010 – 2,842 68.3 10.6 4 0.1 10.9
2011 – 244 5.9 11.4 944 22.8 11.9
No match 1,065 25.6 137 3.3 3,211 77.2
Eligible sample 4,159 100.0 4,159 100.0 4,159 100.0
It can be seen that LSAC children born in the same year were enrolled in the same year level across
different calendar years. For example, out of all Year 5 children, 23% were enrolled in 2009, 68%
in 2010 and 6% in 2011. Also, children of the same age were enrolled in different year levels in
the same calendar year. For example, in 2009, 5% of children were enrolled in Year 3, 72% were
enrolled in Year 4 (not shown) and 23% were enrolled in Year 5. It is important to remember that
children of the same birth cohort belong to different year level cohorts and children of the same
year level cohort belong to different birth cohorts, due to LSAC data being collected for children
of similar age and NAPLAN data being collected for children of the same year level.
5
The 2012 NAPLAN results were not available at the time of writing.
To sum up:
■■ LSAC measures are collected for children of the same age but with different years of schooling;
and
■■ NAPLAN data are collected for children with the same years of schooling but of different ages.
Therefore:
■■ if a researcher is interested in children’s NAPLAN scores in the same year level, he/she should
take into account of the age of the child at the time of NAPLAN testing; and
■■ if a researcher is interested in children’s outcomes measured during LSAC data collection, he/
she should take into account the year levels of schooling.
0.8
0.6
Percentage
0.4
0.2
0
9.2 9.8 10.4 11.0 11.6 12.2
Age (years)
Figure 2: Age distribution of LSAC children who sat the NAPLAN test in Year 5, 2009–10
It can be seen that the LSAC data for each year level in a given calendar year represent a “censored”
year level cohort, as not all ages are represented for the corresponding school year. For example,
consider the LSAC sample of children who sat the Year 5 NAPLAN test in 2009. The age range
for these children was from 9 years 2 months to 10 years 2 months (with the average age being 9
years 9 months), while the average age of all Australian children who sat Year 5 NAPLAN test in
2009 was 10 years 6 months. Thus, the Year 5, 2009 LSAC sample represents children who entered
school relatively younger compared to their classmates. In contrast, the Year 5, 2011 LSAC sample
represents children who were relatively older than their classmates, as their age varied from 11
years 2 months to 12 years 2 months (with the average age being 11 years 5 months), while the
average age of all Australian children in Year 5, 2011 was also 10 years 6 months. LSAC children
who were in Year 5 in 2009 were, on average, the same age as Year 5, 2010 children nationwide;
however, the LSAC Year 5, 2010 sample was missing all children who were either relatively younger
or relatively older compared to the majority of children enrolled in Year 5, 2010.
Assuming that there are no time-varying influences on the educational system across these
consecutive years and no year-level cohort effect, we can assume that LSAC Year 5, 2009 children
are representative of relatively younger Year 5, 2010 children and LSAC Year 5, 2011 children
are representative of relatively older children in Year 5, 2010. As a result, LSAC Year 5 children
could be considered as a representative sample of Year 5 children in the population, regardless
of the calendar year (2009, 2010 or 2011). To assess whether this assumption is plausible or not,
we examined whether the national NAPLAN scores, percentage of children at and above national
minimum standard (NMS) and participation rates were significantly different across 2009–11. Table
8 shows that there were no significant differences in Year 5 NAPLAN results across 2009–11.
Source: Australian Curriculum, Assessment and Reporting Authority (ACARA), 2010 and 2011
To sum up:
■■ Due to the distribution of the school data in the LSAC NAPLAN sample, a school year level in
a given calendar year cannot be considered representative of a corresponding year level in
the population.
■■ Any analysis of NAPLAN data in LSAC by year level for a given calendar year should be avoided
(e.g., Year 3, 2008).
concurrently. Out of these 71% of children, only 13% were interviewed in LSAC prior to NAPLAN
testing, and the rest were interviewed after NAPLAN testing. As a result, for 80% of Year 5 children,
NAPLAN testing took place before the LSAC Wave 4 data collection. The same pattern would be
consistent across all year levels.
To sum up:
■■ Timing of NAPLAN testing in relation to the LSAC data collection is of crucial importance, as it
determines to which wave of LSAC data the NAPLAN data should be linked.
300
100
0
May 2009 May 2010 May 2011
2010–11 date of interview
Analysis
Longitudinal Cross-sectional
NAPLAN—DV NAPLAN—IV
Figure 4: Step-by-step process of choosing the correct NAPLAN data and LSAC data
“Period X” refers to a period of two years—the year prior to the LSAC data collection and the year
of LSAC data collection—with “X” referring to the wave. So, Period 1 covers 2003 and 2004, with
Wave 1 data collection in 2004; Period 2 covers 2005 and 2006, with Wave 2 data collection in 2006;
Period 3 covers 2007 and 2008, with Wave 3 data collection in 2008; and so on. Figure 5 shows the
collection of NAPLAN data by period. It can be seen that the same year level NAPLAN results are
collected at different periods and NAPLAN results for different year levels are represented within the
same period. Given that NAPLAN scores are measured on the same (common) scale and equated
across different year levels, within a particular period an analyst can standardise NAPLAN scores
for each year level separately and model NAPLAN scores regardless of year level.
Below, we outline possible combinations of NAPLAN and LSAC data, depending on the type of
the analysis under consideration.
Table 9: Mean, minimum and maximum time gap between Year 5 NAPLAN data (2009–11) and
Wave 3 LSAC data (2008), by calendar year
Time gap (months) =
NAPLAN Wave N date(NAPLAN) – date(LSAC interview)
Mean Minimum Maximum
Year 5, 2009 Wave 3, 2008 919 10.8 1.2 13.3
(Period 4) (Period 3)
Year 5, 2010 Wave 3, 2008 2,792 22.0 15.3 25.0
(Period 4) a (Period 3)
Year 5, 2011 Wave 3, 2008 238 33.4 29.1 36.8
(Period 5) b (Period 3)
Total 3,949 20.1 1.2 36.8
Note: Not all children with linked NAPLAN data participated at Wave 3. a Fourteen per cent of children sat Year 5 NAPLAN tests
in 2010 after the LSAC Wave 4 interview; however, Wave 4 data cannot be used as the most recent LSAC data, because
the time gap between Wave 4 and Year 5 NAPLAN for these children was on average less than a month. Therefore, Wave 4
data cannot be used in longitudinal designs as the data were measured prior to Year 5, 2010 NAPLAN data. b For children
who sat Year 5 NAPLAN tests in 2011, the most recent LSAC data are Wave 4 not Wave 3. However, the proportion of
these children is relatively low (6%) and use of Wave 4 data for this group does not affect the average time gap between
Year 5 NAPLAN and the most recent LSAC data (18.9 months). Use of Wave 3 and Wave 4 LSAC measures complicates
the modelling, as in such a design, not only NAPLAN data but also LSAC data are measured at different time points.
Table 10 shows the correspondence between NAPLAN and LSAC data when NAPLAN scores are
modelled by period using NAPLAN data collected during LSAC Wave 4. In this scenario, Wave 4
NAPLAN data is modelled as a function of LSAC measures collected at Wave 3. When modelling
NAPLAN by period, it is advisable to standardise NAPLAN scores for Year 3 and Year 5 separately.
When modelling NAPLAN scores as a function of LSAC characteristics/outcomes collected earlier, it
is important to control for child’s age at the time of NAPLAN test, time gap and year level, where
appropriate.
Table 10: Mean, minimum and maximum time gap between NAPLAN data (2009–10) and Wave 3
LSAC data (2008), by year level
Time gap (months) =
NAPLAN Wave N date(NAPLAN) – date(LSAC interview)
Mean Minimum Maximum
Year 3, 2009 Wave 3, 2008 191 9.3 5.2 12.9
(Period 4) (Period 3)
Year 5, 2009 Wave 3, 2008 919 10.8 1.2 13.3
(Period 4) (Period 3)
Year 5, 2010 Wave 3, 2008 2,792 22.0 15.3 25.0
(Period 4) a (Period 3)
Total 3,902 18.8 1.2 25.0
Note: Not all children with linked NAPLAN data participated at Wave 3. a Fourteen per cent of children sat Year 5 NAPLAN tests
in 2010 after the LSAC Wave 4 interview; however, Wave 4 data cannot be used as the most recent LSAC data, because
the time gap between Wave 4 and Year 5 NAPLAN for these children was on average less than a month. Therefore, Wave 4
data cannot be used in longitudinal designs as the data were measured prior to Year 5, 2010 NAPLAN data.
Table 11: Mean, minimum and maximum time gap between Year 3 NAPLAN data (2008–09) and
Wave 4 LSAC data (2010), by calendar year
Time gap (months) =
NAPLAN Wave N date(LSAC interview) – date(NAPLAN)
Mean Minimum Maximum
Year 3, 2008 Wave 4, 2010 2,762 26.0 22.3 33.1
(Period 3) (Period 4)
Year 3, 2009 Wave 4, 2010 190 14.9 10.4 18.8
(Period 4) (Period 4)
Total 2,952 25.3 10.4 33.1
Note: Not all children with linked NAPLAN data participated at Wave 4 and not all children who participated in Wave 4 had
Year 3 NAPLAN data linked (as Year 3, 2007 children did not sit NAPLAN).
Table 12 shows the correspondence between NAPLAN and LSAC data when LSAC outcomes are
modelled by period, using NAPLAN data collected during LSAC Wave 3. In this example, LSAC
outcomes measured at Wave 4 are modelled as a function of NAPLAN data collected in the Wave
3 period. When modelling LSAC outcomes by period, it is advisable to standardise NAPLAN scores
for Year 3 and Year 5 separately.
When modelling LSAC outcomes as a function of NAPLAN scores measured earlier, it is important
to control for years of schooling, time gap and year level, where appropriate.
Table 12: Mean, minimum and maximum time gap between NAPLAN data (2008–09) and Wave 4
LSAC data (2010), by year level
Time gap (months) =
NAPLAN Wave N date(LSAC interview) – date(NAPLAN)
Mean Minimum Maximum
Year 3, 2008 Wave 4, 2010 2,762 26.0 22.3 33.1
(Period 3) (Period 4)
Year 3, 2009 Wave 4, 2010 190 14.9 10.4 18.8
(Period 4) (Period 4)
Year 5, 2009 Wave 4, 2010 891 13.8 10.3 19.7
(Period 4) (Period 4)
Total 3,843 22.6 10.3 33.1
Note: Not all children with linked NAPLAN data participated at Wave 4.
Cross-sectional analysis
In cross-sectional design, it is not important whether children sit NAPLAN tests before or after the
LSAC interview, provided both NAPLAN and LSAC measures are collected during the same period.
For example, consider Wave 4 data collection. Table 13 provides the correspondence between
NAPLAN data and LSAC Wave 4 data when the analysis is intended for a specific year level.
Table 13: Year 5 NAPLAN data (2009–10) and Wave 4 LSAC data (2010)
NAPLAN Wave N
Year 5, 2009 Wave 4, 2010 891
Year 5, 2010 Wave 4, 2010 2,733
Total 3,624
Note: Not all children with linked NAPLAN data participated at Wave 4.
Table 14 describes the NAPLAN data to be used when modelling LSAC outcomes measured at Wave
4 by period. As above, when modelling LSAC outcomes by period, it is advisable to standardise
NAPLAN scores for Year 3 and Year 5 separately and consider controlling for child’s age during
NAPLAN testing and for each year level.
Table 14: Year 3 and Year 5 NAPLAN data (2009–10) and Wave 4 LSAC data (2010)
NAPLAN Wave N
Year 3, 2009 Wave 4, 2010 190
Year 5, 2009 Wave 4, 2010 891
Year 5, 2010 Wave 4, 2010 2,733
Total 3,814
Note: Not all children with linked NAPLAN data participated at Wave 4.
The options presented above are not exhaustive and have primarily been given to introduce the
complexity of using NAPLAN data in the LSAC birth cohort study and provide a possible solution
as to how to deal with this complexity. The logic may vary depending on the research questions.
It should also be noted that examples of longitudinal analyses presented in section 4.5 are shown
for NAPLAN and LSAC data measured only at one point in time, though not at contemporaneous
time points. If a researcher would like to model NAPLAN or LSAC data measured at multiple times,
the same logic can be applied.
5. Comparative analysis
5.1 LSAC NAPLAN scores vs national NAPLAN scores
This section describes a comparative analysis of national NAPLAN scores and NAPLAN scores in
the LSAC sample overall and across different socio-demographic groups.
Population weights
LSAC estimates are calculated using weighted data. While it is crucial to use population weights
to match the LSAC sample to the composition of the general NAPLAN population of children and
adjust for between-waves attrition, these weights do not account for possible differences in the
distribution of NAPLAN scores. The weighting takes into account the variation between different
socio-demographic groups but does not account for variation within a particular socio-demographic
group. For example, it could well be that, while the weighted socio-demographic distribution
of LSAC families with children born between March 1999 and March 2000 matches the general
population, participating families and children may be different on outcome/performance measures.
difference between two estimated means is considered statistically significant if the corresponding
CIs do not overlap.6
Overall scores
Figure 6 and all subsequent figures represent NAPLAN mean scores for the LSAC sample and
nationwide. The circles indicate the LSAC means, and the line segments represent the 95%
confidence intervals of the estimated means. The triangles represent the population means and
hence have no confidence intervals.
Numeracy
Spelling
Year 5
Grammar
Writing
Reading
Numeracy
Spelling
Year 3
Grammar
Writing
Reading
350 400 450 500 550
Mean score
LSAC Australia
Confidence interval
Figure 6: LSAC NAPLAN and national NAPLAN scores, by tests and year levels
It can be seen that Year 3 LSAC children scored significantly higher on average across all tests.
The main reason is that the Year 3 LSAC sample is a censored sample and not a representative
sample of the Year 3 cohort; that is, children who entered school relatively young compared to
their peers were not represented in Year 3 LSAC sample (i.e., Year 3 in 2007). The differences might
be also due to the age differences between the LSAC sample and the corresponding nationwide
population and availability of NAPLAN data by state. On average, children in the LSAC Year 3
cohort were 3 months older than the Year 3 NAPLAN national cohort in 2008. Edwards, Taylor,
and Fiorini (2009) found that older children score significantly higher in cognitive tests than their
younger classmates. Moreover, approximately 50% of children for whom Year 3 NAPLAN data were
not available (because the children were in Year 3 in 2007) were from Queensland. According to
national statistics (ACARA, 2008), in 2008, Queensland children scored significantly lower on all
tests compared to all other states and territories except Northern territory. In addition, children
in Queensland on average were 5 months younger and had spent one year less in school than
children in Year 3 in all other states.
A similar pattern is observed across Year 5 NAPLAN results. While there are significant differences
observed in NAPLAN results between LSAC children and the Australian population, the magnitude
of the differences is smaller than in the Year 3 NAPLAN results. One explanation may be that the
LSAC sample is not a representative sample of the year level cohort, even if population weights
are employed. Although the weighting takes into account sample attrition, it does so only on
the basis of the variables that were used to model such attrition (Sipthorp & Misson, 2009).
6
It is worth noting that if CIs overlap, the difference between the two means still may be significantly different,
but a t-test is required to confirm this. Given that the purpose of this report is to examine the representativeness
of NAPLAN results in the LSAC sample, the comparison of NAPLAN results across different demographic groups
within the LSAC sample is not tested unless the differences are obvious (that is, the CIs of corresponding means
do not overlap).
Consequently, unmeasured or unobserved variables are not accounted for. For example, results
of logistic regression show that NAPLAN data are less likely to be linked for children who had
poorer development of receptive language and vocabulary, even after controlling for main socio-
demographic characteristics. Moreover, these differences are persistent; even when comparing the
LSAC NAPLAN with national NAPLAN scores only for metropolitan areas, the LSAC children scored
significantly higher than Australian children. Therefore, an exclusion of remote and extremely
remote areas from the LSAC sample is unlikely to be the reason for the observed disparity.
Reading
Female
Year 5
Male
Female
Year 3
Male
300 350 400 450 500 550
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 7: LSAC NAPLAN and national NAPLAN Reading scores, by gender and year level
Writing
Female
Year 5
Male
Female
Year 3
Male
300 350 400 450 500 550
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 8: LSAC NAPLAN and national NAPLAN Writing scores, by gender and year level
Spelling
Female
Year 5
Male
Female
Year 3
Male
300 350 400 450 500 550
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 9: LSAC NAPLAN and national NAPLAN Spelling scores, by gender and year level
Numeracy
Female
Year 5
Male
Female
Year 3
Male
300 350 400 450 500 550
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 10: LSAC NAPLAN and national NAPLAN Numeracy scores, by gender and year level
Male
Female
Year 3
Male
300 350 400 450 500 550
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 11: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by gender and
year level
Reading
LBOTE
Year 5
Non-LBOTE
LBOTE
Year 3
Non-LBOTE
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 12: LSAC NAPLAN and national NAPLAN Reading scores, by LBOTE and year level
Writing
LBOTE
Year 5
Non-LBOTE
LBOTE
Year 3
Non-LBOTE
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 13: LSAC NAPLAN and national NAPLAN Writing scores, by LBOTE and year level
Spelling
LBOTE
Year 5
Non-LBOTE
Year 3 LBOTE
Non-LBOTE
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 14: LSAC NAPLAN and national NAPLAN Spelling scores, by LBOTE and year level
Numeracy
LBOTE
Year 5
Non-LBOTE
LBOTE
Year 3
Non-LBOTE
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 15: LSAC NAPLAN and national NAPLAN Numeracy scores, by LBOTE and year level
Year 5
Non-LBOTE
LBOTE
Year 3
Non-LBOTE
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 16: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by LBOTE and
year level
Reading
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 5
Year 12 or equivalent
Year 11 or equivalent
Not stated
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 3
Year 12 or equivalent
Year 11 or equivalent
Not stated
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 17: LSAC NAPLAN and national NAPLAN Reading scores, by parental education and year
level
Writing
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 5
Year 12 or equivalent
Year 11 or equivalent
Not stated
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 3
Year 12 or equivalent
Year 11 or equivalent
Not stated
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 18: LSAC NAPLAN and national NAPLAN Writing scores, by parental education and year
level
Spelling
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 5
Year 12 or equivalent
Year 11 or equivalent
Not stated
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 3
Year 12 or equivalent
Year 11 or equivalent
Not stated
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 19: LSAC NAPLAN and national NAPLAN Spelling scores, by parental education and year
level
Numeracy
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 5
Year 12 or equivalent
Year 11 or equivalent
Not stated
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 3
Year 12 or equivalent
Year 11 or equivalent
Not stated
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 20: LSAC NAPLAN and national NAPLAN Numeracy scores, by parental education and year
level
Year 5
Year 12 or equivalent
Year 11 or equivalent
Not stated
Bachelor degree or above
Advanced diploma
Certificate I to IV
Year 3
Year 12 or equivalent
Year 11 or equivalent
Not stated
300 350 400 450 500
Mean score
LSAC Australia
Confidence interval
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 21: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by parental
education and year level
Reading
Senior managers/qualified professionals
Other business managers
Tradespeople, clerks, skilled staff
Year 5
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 22: LSAC NAPLAN and national NAPLAN Reading scores, by parental occupation and year
level
Writing
Senior managers/qualified professionals
Other business managers
Tradespeople, clerks, skilled staff
Year 5
Machine operators, hospitality staff
Not in paid work
Not stated
Senior managers/qualified professionals
Other business managers
Tradespeople, clerks, skilled staff
Year 3
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 23: LSAC NAPLAN and national NAPLAN Writing scores, by parental occupation and year
level
Spelling
Senior managers/qualified professionals
Other business managers
Tradespeople, clerks, skilled staff
Year 5
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 24: LSAC NAPLAN and national NAPLAN Spelling scores, by parental occupation and year
level
Numeracy
Senior managers/qualified professionals
Other business managers
Tradespeople, clerks, skilled staff
Year 5
Machine operators, hospitality staff
Not in paid work
Not stated
Senior managers/qualified professionals
Other business managers
Tradespeople, clerks, skilled staff
Year 3
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 25: LSAC NAPLAN and national NAPLAN Numeracy scores, by parental occupation and year
level
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Figure 26: LSAC NAPLAN and national NAPLAN Grammar and Punctuation scores, by parental
occupation and year level
A ten-band continuum “represent[s] the increasing complexity of the skills and understandings
assessed by NAPLAN from Years 3 to 9” (MCEECDYA, 2009, p. 2). At each year level, student
performance is reported within six of these bands:
■■ Year 3: results reported in Band 1 to Band 6
■■ Year 5: results reported in Band 3 to Band 8
■■ Year 7: results reported in Band 4 to Band 9
■■ Year 9: results reported in Band 5 to Band 10.
For each year level, the lowest band represents students who are below the national minimum
standard, the second lowest band represents students who are at the NMS, and the other four bands
represent students who are above the NMS. For example, Year 3 students will be below the NMS if
their scaled scores are within Band 1, while Year 5 students will be below the NMS if their scaled
scores are within Band 3 or below. More information on deriving and reporting NAPLAN scores
can be found in the reports by MCEECDYA (2009) and VCAA (2009).
Tables 16 to 20 report NAPLAN scores for LSAC children and the general NAPLAN population
by year level and test year. It can be seen that, depending on the test, between 17% and 20%
of children in Year 5, 2009, between 6% and 13% of children in Year 3, 2008, and between 5%
and 13% of children in Year 3, 2009 had scores at or below the NMS on all NAPLAN tests. The
proportion of exempt students in the LSAC sample is very small. Across all year levels, regardless
of the year, the proportion of children within the LSAC sample who scored above the NMS was
higher and the proportion of children who scored at or below NMS was lower than within the
general NAPLAN population of children. Even though the sample size of Year 3, 2009 is too small
for any robust conclusions, the same pattern is observed.
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
Table 20: National minimum standards for Grammar and Punctuation, by year level
Exempt Below NMS At NMS Above NMS
% (N) % (N) % (N) % (N)
Year 3 LSAC 0.6 (18) 3.2 (96) 6.0 (179) 90.3 (2,716)
National 1.7 6.5 10.6 85.2
Year 5 LSAC 0.5 (18) 4.6 (181) 8.8 (348) 86.1 (3,392)
National 1.9 5.7 9.0 83.4
Source: LSAC NAPLAN Year 3 (2008, 2009) vs national NAPLAN Year 3 (2008); LSAC NAPLAN Year 5 (2009, 2010, 2011) vs
national NAPLAN Year 5 (2010)
To sum up, the Year 3 children of the LSAC sample scored significantly higher on all tests compared
with the corresponding population of Australian children who were tested in Year 3 in 2008. This
trend was consistent across different socio-demographic groups; that is, not only did LSAC children
score higher overall than Australian children, but they also scored higher across different socio-
demographic characteristics. The Year 5 children of the LSAC sample had similar scores on all tests
compared with the corresponding population of Australian children who were tested in Year 5 in
2010. Year 5 results were also consistent across different socio-demographic groups. It is important
to emphasise that the pattern of scores within different socio-demographic groups in the LSAC
sample was similar to the national NAPLAN population.
In particular, regardless of year level, girls scored higher than boys on all tests except Numeracy.
Year 3 children with a language background other than English scored similarly to children with
an English language background on all tests except Reading and Grammar, where the latter scored
higher. Year 5 children with a language background other than English had similar scores to the
children with an English language background.
Children in both Years 3 and 5 with parents who had more educational and/or formal training
qualifications scored higher than children whose parents were less well educated and had
unqualified jobs. In addition, the proportion of children exempt or below the national minimum
standard was smaller in the LSAC sample than in the NAPLAN population of Australian children
for Year 3 and Year 5.
The discrepancy in NAPLAN results between the LSAC sample and the general NAPLAN population
is not surprising. First of all, this can be attributed to differences between the population of children
born in 1999–2000 and the population of children in the same school year level, especially for
children in Year 3. School starting ages vary across states and territories and depend on children’s
readiness for school. As a result, the age of children within the same school year level can vary
substantially. The LSAC sample is a birth cohort sample of children born in 1999–2000 who are
enrolled in different school year levels. Secondly, although population weights are used to take
into account between-waves attrition, these weights do not adjust NAPLAN scores for those families
who withdrew from the study.
of interpretation, the rating scales were reversed so that the larger value referred to the
child’s higher progress. We expected that the parents’ ratings of reading and mathematical
achievement would be at least moderately correlated with the NAPLAN Reading and Numeracy
tests, respectively.
As the proposed analysis is cross-sectional and aims to examine the correlation of NAPLAN scores
and LSAC educational outcomes during the same period, LSAC measures for Year 3 are derived
from Wave 3 data and LSAC measures for Year 5 are derived from Wave 4 data.
The degree of association between NAPLAN results and LSAC learning and cognitive measures was
calculated using correlation analysis. The analysis was performed separately for Year 3 and Year 5,
regardless of the calendar year in which the test was taken. The Pearson correlation was used to test
the association of NAPLAN results with the intelligence measures and teachers’ evaluations (as all
of these variables are continuous), while the polyserial correlation was used to test the association
between NAPLAN scores and parents’ ratings (as parents’ ratings are represented by ordinal
variables). While calculating the correlation matrix, we used pairwise deletion of missing cases
instead of case-wise deletion. Case-wise deletion of missing cases would lead to a considerable
reduction in sample size, as missing data are relatively randomly distributed between cases, and
variables and teacher’s ratings are missing for about 20% and 15% of cases for Year 3 and Year 5
respectively.
Correlation results
Correlation results for Years 3 and 5 are displayed in Tables 21 and 22 respectively. All correlation
coefficients are positive and statistically significant at the 1% confidence level. We consider the
correlation coefficient r to be small if its absolute value is less than or equal to 0.3, medium if its
absolute value is more than 0.3 but less than or equal to 0.5, and large if it is more than 0.5 in
magnitude (Cohen, 1988).
Examination of Table 21 suggests that LSAC teachers’ and parents’ ratings are consistent with
corresponding NAPLAN tests. That is, Year 3 NAPLAN results on Reading, Spelling, and Grammar
and Punctuation tests are strongly correlated with teachers’ language and literacy ratings (r = .64
for all), and strongly correlated with parents’ evaluations of reading progress (r = .54, r = .59,
and r = .45, respectively). Similarly, Year 3 NAPLAN results on Numeracy are strongly correlated
with teachers’ ratings of mathematical thinking (r = .61) and moderately correlated with parental
evaluation of mathematical achievement (r = .49).
Writing skills were not directly assessed either by teachers or parents; however, it is expected that
Year 3 NAPLAN writing results would be at least moderately correlated with teachers’ assessments of
language and literacy and parents’ assessments of reading progress (r = .58 and r = .46, respectively.
Large correlations between teachers’ ratings and corresponding NAPLAN tests provide support that
teachers are the best informants of children’s academic performance.
The PPVT is moderately correlated with Year 3 NAPLAN results on all but the Writing and Spelling
tests, with the largest correlation coefficient being with the Reading test (r = .44). These results
are also consistent, given that PPVT measures receptive language and vocabulary and the level of
understanding of spoken words.
The Matrix Reasoning test is moderately correlated with all NAPLAN tests but the Writing test, with
the largest correlation coefficient being with the Numeracy test (r = .49). That is also consistent with
expectations, as the Matrix Reasoning test measures non-verbal problem-solving ability.
Correlation analysis for Year 5 data reveals similar trends as for Year 3, but with slightly smaller
correlation coefficients between NAPLAN results and parents’ ratings (see Table 22).
Table 21: Correlation coefficients for NAPLAN scores and LSAC learning and cognitive measures,
Year 3, 2008–09
NAPLAN Grammar and
Reading Writing Spelling Numeracy
LSAC Punctuation
Receptive vocabulary
PPVT .44 .29 .29 .38 .42
N = 2,904 N = 2,910 N = 2,910 N = 2,907 N = 2,905
Non-verbal ability
Matrix .42 .34 .37 .41 .49
Reasoning N = 2,906 N = 2,912 N = 2,912 N = 2,909 N = 2, 907
Academic Rating Scale (teachers’ rating)
Language and .64 .58 .64 .64 .58
literacy N = 2,439 N = 2,443 N = 2,443 N = 2,441 N = 2,440
Mathematical .56 .51 .57 .56 .61
thinking N = 2,426 N = 2,430 N = 2,430 N = 2,428 N = 2,427
Parents’ rating
Reading .54 .46 .59 .45 .52
progress N = 2,924 N = 2,929 N = 2,929 N = 2,925 N = 2,926
Math progress .38 .33 .39 .36 .49
N = 2,924 N = 2,929 N = 2,929 N = 2,926 N = 2925
Table 22: Correlation coefficients for NAPLAN scores and LSAC learning and cognitive measures,
Year 5, 2009–11
NAPLAN Grammar and
Reading Writing Spelling Numeracy
LSAC Punctuation
Receptive vocabulary
PPVT .55 .30 .32 .44 .43
N =887 N =885 N =884 N =886 N =886
Non-verbal ability
Matrix .40 .36 .36 .43 .49
Reasoning N =3,739 N =3,735 N =3,742 N =3,742 N =3,739
Academic Rating Scale (teachers’ rating)
Language and .60 .56 .65 .62 .59
literacy N = 3,083 N =3,083 N =3,087 N =3,087 N =3,084
Mathematical .52 .51 .57 .54 .61
thinking N =3,001 N =3,002 N =3,006 N =3,006 N =3,002
Parents’ rating
Reading .34 .34 .38 .33 .42
progress N =3,766 N =3,763 N =3,770 N =3,770 N =3,766
Math progress .31 .31 .35 .31 .46
N =3,766 N =3,763 N =3,770 N =3,770 N =3,766
6. Conclusion
There are a number of benefits of having a longitudinal national assessment program such as
NAPLAN linked to LSAC. The NAPLAN data measure the development of children’s achievements
from Year 3 to Year 9 on five different domains: Reading, Writing, Spelling, Numeracy, and Grammar
and Punctuation, and therefore these scales allow assessment and comparison of children’s
achievements across year levels and over time. It also provides an opportunity to test how the
cognitive and learning measures used in LSAC are associated with NAPLAN test scores, and allows
an examination of the association between children’s achievements and different individual and
family characteristics, both cross-sectionally and longitudinally. This, in turn, enhances the value
of LSAC data to policy-makers and academic researchers.
In this report, we have used K cohort LSAC data and NAPLAN results from 2008 to 2011. Out of
4,983 K cohort children, NAPLAN data were linked for 4,159 children. By 2011, out of the eligible
LSAC NAPLAN sample, Year 3 NAPLAN results were linked for 74% of all LSAC Wave 1 children.
Twenty-five per cent of children did not have Year 3 NAPLAN results, as they were enrolled in
Year 3 in 2007, when NAPLAN assessment had not yet been implemented. Year 5 NAPLAN results
were linked for 97% of children. For 3% of Year 5 children the NAPLAN data were not matched.
Year 7 NAPLAN results were linked for 23% of children, with 75% of children not yet having Year
7 NAPLAN results due to their only being enrolled either in Year 5 or 6 in 2011.
Overall, when comparing the Year 3 NAPLAN scores across LSAC children and all Australian
children tested in 2008, the former scored significantly higher. While the LSAC Year 5 children also
had higher scores compared to the general population of Australian Year 5 children tested in 2010,
the magnitude of the difference was not large. Regardless of year level, the proportion of children
exempt or below the national minimum standard was smaller in the LSAC sample than Australia-
wide, especially for Year 3 children. The differences in results were mainly due to the fact that the
LSAC NAPLAN sample cannot be considered as a representative sample of Australian children in
a specific year level, even after accounting for attrition. In addition, differences in Year 3 results
were also due to the censoring of LSAC children enrolled in Year 3 in 2007. Importantly, results
from this study suggest that while there are differences in the mean NAPLAN scores between the
general population and the LSAC sample for both Year 3 and Year 5, the same patterns of variation
in NAPLAN scores by demographic variables can be observed. Moreover, for Year 5 children, due
to the small number of children enrolled in Year 7 by 2011, Year 7 LSAC NAPLAN results were not
compared against national statistics.
Correlations between NAPLAN and LSAC cognitive and learning measures were moderate to large,
with similar measures being more highly correlated than others. The NAPLAN measures were
associated in the expected directions with LSAC cognitive and learning measures such as verbal
and non-verbal ability, and teachers’ and parents’ ratings of literacy and numeracy.
While it is of great benefit to use NAPLAN data along with LSAC data, a researcher should always
keep in mind that:
■■ the LSAC NAPLAN data are not representative of national NAPLAN scores, even after controlling
for attrition;
■■ analyses of NAPLAN data in LSAC should be performed at year level or period, not calendar
year;
■■ LSAC outcome measures are collected for children of the same age but with different years of
schooling, while NAPLAN data are collected for children of different age but with the same
years of schooling;
■■ Year 3 NAPLAN results for K cohort children are available only for 74% of the eligible LSAC
NAPLAN sample, which should be taken into account when performing longitudinal analyses
and comparison against national statistics; and
■■ great care should be taken when deciding on what NAPLAN and LSAC data to use, including:
–– whether the analysis is longitudinal or cross-sectional;
–– whether the NAPLAN scores are being considered as a dependent or independent variable;
and
References
Australian Curriculum, Assessment and Reporting Authority. (2008). National Assessment Program—Literacy and
Numeracy: Achievement in reading, writing, language conventions and numeracy. Sydney: ACARA.
Australian Curriculum, Assessment and Reporting Authority. (2010). National Assessment Program—Literacy and
Numeracy: Achievement in reading, writing, language conventions and numeracy. National report for 2010. Sydney:
ACARA.
Australian Curriculum, Assessment and Reporting Authority. (2011). National Assessment Program—Literacy and
Numeracy: Achievement in reading, persuasive writing, language conventions and numeracy. National Report for
2011. Sydney: ACARA.
Australian Bureau of Statistics. (1997). ASCO: Australian Standard Classification of Occupations (2nd ed.) (Cat. No.
1220.0). Canberra: ABS.
Cohen, J. (1988) Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Edwards, B., Taylor, M., & Fiorini, M. (2009). Does it matter at what age children start school in Australia? Investigating
school starting age and six-year old children’s outcomes. Paper presented at the 2nd LSAC Research Conference,
Melbourne.
Long, J. S. (1997). Regression models for categorical and limited dependent variables, Thousand Oaks, CA: Sage
Publications.
Ministerial Council for Education, Early Childhood Development and Youth Affairs. (2009). 2009 National Assessment
Program Literacy and Numeracy: Achievement in reading, writing, language conventions and numeracy. Melbourne:
MCEECDYA.
Sipthorp, M., & Misson, B. (2009) Wave 3 weighting and non-response (LSAC Technical Paper No. 6). Melbourne:
Australian Institute of Family Studies.
Victorian Curriculum and Assessment Authority. (2009). National Assessment Program Literacy and Numeracy: NAPLAN
2009 reporting guide. Year 3, Year 5, Year 7 and Year 9. Melbourne: VCAA.
Appendixes
Percentage
Parental characteristics
Education
Bachelor degree (ref.) 38.1
Advanced diploma 9.9
Certificate 32.2
Year 12 8.8
Year 11 or below 10.9
Mother’s working status
35 or more (ref.) 20.1
Less than 35 36.3
Not working 43.6
LBOTE 29.6
Child’s characteristics
Female 49.1
Tests Mean (SD)
WAI (25–100) 64.0 (8.1)
SDQ (0–35) 9.4 (5.3)
PPVT (25–85) 64.2 (6.2)
NAPLAN LSAC
Bachelor degree and above Postgraduate diploma
Graduate diploma/graduate certificate
Bachelor degree
Advanced diploma/diploma Advanced diploma
Certificate I–IV Certificate III/IV (including trade certificate)
Certificate I/II
Year 12 or equivalent Year 12 or equivalent
Year 11 or equivalent or below Year 11 or equivalent
Year 10 or equivalent
Year 9 or equivalent
Year 8 or equivalent
Not stated (no cases)
Note: a
Australian Standard Classification of Occupations (ASCO) codes (ABS, 1997).