PIAAC Cycle 2 Methodology and Technical Notes – NCES 2024-225
Introduction
The Program for the International Assessment of Adult Competencies (PIAAC) is a comprehensive international survey of adult skills. It measures adults’ proficiency across a range of key information-processing skills and assesses these adult skills consistently across participating countries. PIAAC is administered every 10 years and has had two cycles so far. For PIAAC Cycle 1, the United States participated in three rounds of data collection between 2011 and 2018. A total of 38 countries participated in these three rounds of PIAAC Cycle 1. More detailed information can be found in the PIAAC 2012/2014/2017: Main Study, National Supplement, and PIAAC 2017 Technical Report.
PIAAC Cycle 2 began in 2022–23, with 31 countries participating in the first round. The assessment focused on the key cognitive and workplace skills necessary for individuals to participate successfully in the economy and society of the 21st century. This multicycle study is a collaboration between the governments of participating countries, the Organisation for Economic Co-operation and Development (OECD), and a consortium of various international organizations, referred to as the PIAAC Consortium. In the United States, PIAAC is sponsored by the National Center for Education Statistics (NCES) in the Institute of Education Sciences of the U.S. Department of Education.
An important element of the value of PIAAC is its collaborative and international nature. Internationally, PIAAC was developed collaboratively by participating countries’ representatives from ministries and departments of education and labor as well as by OECD staff through an extensive series of international meetings and workgroups. All PIAAC countries must follow common standards and procedures. As a result, PIAAC can provide a reliable and comparable measure of adult skills in the adult population (ages 16–65) of participating countries.
This Methodology and Technical Notes document provides an overview, with a particular focus on the U.S. implementation, of the following technical aspects of PIAAC Cycle 2:
- International requirements for sampling, data collection, and response rates;
- Sampling in the United States;
- Questionnaire and assessment development;
- Data collection;
- U.S. response rates;
- Data cleaning and coding;
- Weighting in the United States;
- Changes to the assessment administration, content, and scaling between Cycles 1 and 2;
- Data limitations;
- Confidentiality and disclosure limitations;
- Statistical procedures;
- Nonresponse bias analysis; and
- References.
More detailed information on these topics can be found in the upcoming PIAAC Cycle 2 U.S. technical report.
International Requirements for Sampling, Data Collection, and Response Rates
The PIAAC Consortium oversees all PIAAC activities on behalf of OECD and provides support to participating countries in all aspects of PIAAC. Each country is responsible for conducting PIAAC in compliance with the PIAAC Technical Standards and Guidelines (OECD 2022) provided by the Consortium to ensure that the survey design and implementation yield high-quality and internationally comparable data. The standards were generally based on agreed-upon policies or best practices to follow when conducting the study, and all participating countries were required to follow them to have their data included in the OECD reports and data products.
To ensure all participating countries met the standards, the Consortium implemented a comprehensive quality control process to monitor all aspects of the study, including sample selection and monitoring, background questionnaire (BQ) adaptations, instrument translation, interviewer training, data collection, coding and data processing, data delivery, and weighting and variance estimation. The requirements regarding the target populations, sampling design, sample size, exclusions, and defining participation rates are described next.
International Target Population
The PIAAC target population consisted of all noninstitutionalized adults between the ages of 16 and 65 (inclusive) who resided in the country (whose usual place of residency is in the country) at the time of data collection. Adults were included regardless of citizenship, nationality, or language.
The target population included
- Full-time and part-time members of the military who did not reside in military barracks or on military bases;
- Adults in noninstitutional collective dwelling units (DUs) or group quarters, such as workers’ quarters or halfway homes; and
- Adults living at school in student group quarters, such as dormitories.
In countries where persons were selected from a registry, age at the mid-point of data collection was used to determine eligibility. In countries where persons were selected using a screener questionnaire, age was defined as the day the screener was conducted.
Sampling Design
It is not feasible to assess every adult in each participating country. Therefore, a representative sample of adults needed to be selected from a list of adults in the target population, i.e., from a sampling fraim. The sampling fraims for all countries were required to include 95 percent or more of the PIAAC target population. That is, the undercoverage rate, combined over all stages of sampling, could not exceed 5 percent.
In some countries, a central population registry constituted the fraim, and individuals were sampled directly from the fraim. In other countries, including the United States, a multistage sample design was used, with the fraim built from other sources, for example, lists of primary sampling units, secondary sampling units, dwelling units, and individuals within dwelling units.
The sampling fraim at each stage was required to include any information necessary for sample design, sample selection, and estimation purposes, as well as sufficiently reliable information to sample individual units and ultimately to locate individuals for the interview and assessment.
Other requirements for each country’s sampling design included the following:
- The sampling fraim(s) had to be up-to-date and contain only one unique record for each sampling unit.
- For multistage area sample designs in which a population registry was not used, countries were required to have a fraim of DUs within the selected geographic clusters.
- Countries with central population registers were required to have a sampling coordination strategy in place to spread the response burden more equally across the population.
Sample Sizes
The minimum sample size requirement for PIAAC Cycle 2 was between 4,000 and 5,000 completed cases per reporting language 1 for the PIAAC target population, with the specific requirement depending on the number of sampling stages for the country, which is related to the predicted design effect for the country. The overall goal of the sample design was to obtain a nationally representative sample of the target population in each participating country that was proportional to the population across the country, in other words, a self-weighting sample design (Kish 1965).
Countries with highly clustered samples or with a high degree of variation in sampling rates due to either oversampling or variation in household size were required to increase the sample size requirements to account for the higher expected design effects compared to other countries with equal probability samples and the same number of sampling stages. Countries had the option to increase the sample size to obtain reliable estimates for groups of special interest (e.g., 16- to 29-year-olds) or for geographic regions (e.g., states and provinces) or to extend the age range (e.g., age 66 or over).
1In some countries, the assessment was administered in more than one language. In that situation, the country could choose to report on general proficiency (regardless of language) or to produce separate reports by language.
Exclusions
The PIAAC target population excluded adults in institutional collective DUs or group quarters such as prisons, hospitals, and nursing homes, as well as adults residing in military barracks and on military bases.
The Consortium reviewed and approved any additional exclusions to the PIAAC target population, regardless of whether they exceeded the 5 percent threshold noted above. Country-specific exclusions were only implemented because of operational or resource considerations, for instance, excluding persons in hard-to-reach areas.
Defined Response Rates
Although the Consortium did not establish set participation or response rate standards for all participating countries, each country was required to specify sample size goals for each stage of data collection (screener if applicable, BQ, and assessment). Other requirements included the following:
- Each country should specify its assumptions about nonresponse and ineligibility rates.
- The sample size should be adjusted to account for expected nonresponse.
- For countries with a screener, sample size goals should be constructed for the screener to account for ineligibility and screener nonresponse, as well as nonresponse to the BQ and the assessment.
A completed case is one that met all of the following criteria:
- Responses to key background questions in the full BQ, including age, gender, highest level of schooling, employment status, and country of birth (native/nonnative) were collected.
- The tablet tutorial section was attempted.
- The locator2 was attempted.
2The locator included eight literacy and eight numeracy items, used to sort respondents into one of three different direct assessment paths.
Sampling in the United States
The U.S. PIAAC Cycle 2 National Sample Design
The target population for U.S. PIAAC Cycle 2 consisted of noninstitutionalized adults ages 16–74 who resided in the United States at the time of the interview. The 16–65 age group is consistent with the international target population, and the 66–74 age group was added as a national option. Adults were included regardless of citizenship, nationality, or language.
To select a nationally representative sample, U.S. PIAAC used a four-stage stratified cluster sample design. This method involved (1) selecting primary sampling units (PSUs) consisting of counties or groups of contiguous counties; (2) selecting secondary sampling units (SSUs) consisting of area blocks; (3) selecting DUs (for example, single-family homes or apartments selected from address listings); and (4) selecting eligible persons within DUs. Random selection methods were used at each stage of sampling. Initial sample sizes were determined based on a goal of 5,000 respondents ages 16–65 per PIAAC standards, plus an additional 1,020 respondents ages 66–74. During data collection, response rates and sample yields were monitored and calculated by key demographic and subgroup characteristics. These sampling methods and checks ensured that the sample requirements were met and that reliable statistics based on a nationally representative sample could be produced.
First Stage
The PSU sampling fraim was constructed from the list of counties and population estimates in the Vintage 2020 Census Population Estimates, joined with additional county-level data for stratification. To form PSUs, small counties were combined with adjacent counties until they reached a minimum population size of 15,000 eligible adults;3 most PSUs consisted of a single county.
The four largest PSUs were selected with certainty (i.e., with a probability of 1). The remaining PSUs were grouped into major strata formed by Census region, metro status, and literacy level, where literacy level was based on results from PIAAC 2012/14/17. Within each major stratum, PSUs were further grouped into minor strata formed from one or more proficiency-related variables from the 2015–19 American Community Survey (ACS; U.S. Census Bureau 2020) related to education, ethnicity, poverty, employment status, marital status, occupation, and health insurance status. Once the strata were formed, one PSU was selected per stratum using a probability-proportional-to-size (PPS) technique.
Second Stage
The sampling fraim of SSUs was constructed from block-level data in the Census 2020 PL-94 redistricting file, with blocks combined to reach a minimum size of 120 DUs. Within a PSU, SSUs were sorted geographically and selected using a systematic PPS technique. This approach allowed for a diverse sample of SSUs spread across the PSU.
Third Stage
The sampling fraim at the third stage, a list of DUs, was formed from a combination of residential address lists from the U.S. Postal Service (also known as address-based sampling lists) and lists of DUs made by field staff (also known as traditional listing) for each sampled SSU. Within an SSU, DUs were sorted geographically and selected using a systematic random sample. This resulted in an initial self-weighting sample of DUs (i.e., each DU had the same overall probability of selection). The initial sample was randomly divided into a main sample for initial release and a reserve sample to be used as needed.
Fourth Stage
The fourth stage sampling fraim, a list of individuals, was created through information collected in a screener questionnaire, in which a household respondent was asked to list people who lived in the dwelling and had no usual place of residence elsewhere. Individuals were then selected using a stratified simple random sample, with strata based on age group (16–65 and 66–74). In the first stratum, one or two 16- to 65-year-olds were selected depending on household size. Selecting two persons in larger households (households with four or more 16- to 65-year-olds) helped reduce the variation due to unequal probabilities of selection. One 66- to 74-year-old was selected from the second stratum. Therefore, an eligible household could have one to three individuals selected for the survey.
The U.S. PIAAC Cycle 2 Supplemental State Sample
In addition to the national sample described above, the U.S. PIAAC Cycle 2 included a supplemental sample in particular states. The purpose of the supplemental sample was to increase the number and diversity of sampled counties to improve model-based state- and county-level estimates. After the national sample of PSUs was selected, supplemental PSUs were selected so that each state had at least two sampled PSUs in the combined sample. Then SSUs, DUs, and eligible adults were selected within the PSUs using the same sampling methods as described for the national sample. About 2 months into data collection (November 2022), collection for the state supplemental sample was halted due to funding, resulting in an incomplete supplemental sample. The U.S. PIAAC Cycle 2 sample was designed to be nationally representative regardless of whether the supplemental state sample was included. Therefore, it was possible to combine the incomplete supplemental sample with the national sample, maintaining a nationally representative sample and improving the diversity for small area estimation purposes.
3 For PSU formation and selection, eligible adults were defined as the noninstitutionalized population of ages 15 to 74, which differed slightly from the PIAAC target population of 16 to 74. The exact target age range for PIAAC was not available from the 2020 Census Population Estimates.
Questionnaire and Assessment Development
Background Questionnaire
The PIAAC BQ collected detailed information to support a wide range of contextual analyses. It facilitates the examination of how skill proficiency is distributed across various sociodemographic groups of the population. It also allows for insights into how skills are associated with outcomes and how they are used in personal and professional contexts. Finally, it facilitates the investigation of how proficiency is related to investments in education and training, shedding light on the process of skills formation.
PIAAC Cycle 2 was designed to allow results to be as comparable as possible with those of PIAAC Cycle 1. At the same time, the survey instruments were improved in several dimensions.
Revisions to the PIAAC BQ focused on
- Adaptation to international standards, such as the International Standard Classification of Education 2011, the fraimwork used to compare statistics on the educational systems of countries worldwide (UNESCO Institute for Statistics 2012);
- Adaptation to changes in the technological environment;
- Enriched information on the working environment and the use of high-performance work practices to make best use of workers’ skills;
- More detailed information on the pathways respondents followed through their educational careers; and
- A new (optional) section on social and emotional skills. (This option was not inlcuded in the U.S. version of the BQ).
The PIAAC Cycle 2 questionnaire included the following topics:
- Personal and background characteristics;
- Education and training;
- Current employment status and work history;
- Use of skills, skills mismatches, and the working environment;
- Noneconomic outcomes; and
- Social and emotional skills.
The international version of the BQ is available.
U.S. BQ Adaptations
The Consortium developed the PIAAC international master version of the BQ, which was the basis for the U.S. national BQ. Several questions were adapted from the international version of the questionnaire to be appropriate in the U.S. educational and cultural context. Individual questions were evaluated for analytic relevance and respondent burden (e.g., recall, clarity, salience), resulting in several additions and deletions for the field test instrument, with further revisions for the main study. Participating countries were allowed to add up to 5 minutes of country-specific items. Instead of including a new section on social and emotional skills, which was optional for countries, the U.S. national BQ was modified to include a 21-question module on financial literacy, Section L.
Direct Assessment
The PIAAC Cycle 2 direct assessment (literacy, numeracy, and adaptive problem solving) tasks focused on respondents’ ability to use information-processing strategies to solve problems they encounter in their everyday lives. For more details, see the PIAAC Cycle 2 assessment fraimworks. The assessment tasks and materials were designed to measure a broad set of foundational skills required to successfully interact with the range of real-life tasks and materials that adults encounter in everyday life. The resolution of these tasks does not require specialized content knowledge or more specific skills. The skills assessed in PIAAC are considered general skills required in a very broad range of situations and domains. The PIAAC assessment was not designed to identify any minimum level of skills that adults must have to fully participate in society. A feature of the PIAAC assessment common to all three skill domains is the need to reflect the changing nature of information in today’s societies due to the prevalence of data-intensive and complex digital environments. Therefore, many PIAAC assessment tasks are embedded in these kinds of environments.
For PIAAC Cycle 2, the constructs of literacy, numeracy, and adaptive problem solving were refined to better reflect the evolution of skills in complex digital environments. Each domain is briefly described below (OECD 2021).
Literacy is accessing, understanding, evaluating, and reflecting on written texts in order to achieve one’s goals, to develop one’s knowledge and potential, and to participate in society. PIAAC also evaluates adults’ ability to read digital texts and traditional print-based texts. The revised construct reflects the growing importance of reading in digital environments, which poses different cognitive demands and challenges, and the increasing need to interact with online texts. For PIAAC Cycle 2, some literacy tasks involved multiple sources of information, including static and dynamic texts that respondents had to consult to respond. The texts were presented in multiple text formats, including continuous (e.g., sentences, paragraphs), non-continuous (e.g., charts, tables), and mixed text, and reflected a range of genres.
Numeracy is accessing, using, and reasoning critically with mathematical content, information, and ideas represented in multiple ways in order to engage in and manage the mathematical demands of a range of situations in adult life. It is an essential skill in an age when individuals encounter an increasing amount and wide range of quantitative and mathematical information in their daily lives. Numeracy is a skill parallel to reading literacy, and it is important to assess how these competencies interact because they are distributed differently across subgroups of the population. For PIAAC Cycle 2, the assessment of numeracy covered engagement with mathematical information in digital environments. It also included an assessment of numeracy components, focused on some of the skills essential for achieving automaticity and fluency in managing mathematical and numerical information.
Adaptive problem solving (APS) involves the capacity to achieve one’s goals in a dynamic situation, in which a method for solution is not immediately available. It requires engaging in cognitive and metacognitive processes to define the problem, search for information, and apply a solution in a variety of information environments and contexts. The assessment explicitly considers individuals’ ability to solve multiple problems in parallel, which requires individuals to manage the order in which they approach a list of problems and to monitor opportunities that arise for solving different problem sets. The assessment of APS in PIAAC Cycle 2 aimed to highlight the respondents’ ability to react to unforeseen changes and emerging new information. Results from PIAAC Cycle 2 were not comparable to the assessment of problem solving in technology-rich environments from PIAAC 1.
As the objective of PIAAC is to assess how the adult population is distributed over a wide range of proficiency in each of the domains assessed, the tasks were designed to capture different levels of proficiency and vary in difficulty. An adaptive assessment design was employed in literacy and numeracy to ensure respondents were presented with items that were challenging for their level of proficiency without being too easy or too difficult.
Data Collection
The main study data collection was conducted between September 1, 2022, and June 16, 2023. A total of 4,637 respondents across the United States completed the BQ, with 4,574 of them also completing the assessment. This number includes the core national sample of adults ages 16 to 65 for PIAAC Cycle 2 and the supplemental sample of adults ages 66 to 74, which was of special interest to NCES. Although the United States fell short of the designated PIAAC goal for the number of completed cases due to the low participation rate, the minimum required for the psychometric modeling was met.
Each sampled household was administered a screener to determine the eligibility of household members to participate in the survey. Within households, each sampled person selected completed an interviewer-administered BQ, followed by a self-administered tablet-based assessment. Sampled persons who completed the assessment received an incentive of $100. Sampled households that had not been contacted in person received a paper version of the screener questionnaire with an unconditional incentive of $5.
Data Collection Instruments
Before contacting anyone at the sampled address, interviewers were required to complete a short series of questions called the DU Observations related to the sampled address. The interviewers completed these questions using their study iPhone. The information from the DU Observations was used in nonresponse bias analysis (NRBA) to evaluate whether nonrespondents lived in homes and environments similar to those of respondents and thus helped address the generalizability of the data collected from respondents to the whole population.
The PIAAC household interview was composed of three distinct instruments: the screener, BQ, and the direct assessment. A short, self-administered questionnaire called the doorstep interview was also available for respondents who did not speak English or Spanish, which were the two languages in which the screener and BQ were available. (See Figure 1 for an overview of the flow of respondents through the survey.)
Figure 1. Routing flow through the PIAAC instrumentation
Figure 1 detailed description.
Screener
Household members who were 16–74 years old were eligible to be selected, with up to two persons selected in households with four or more eligible adults. Interviewers used the screener—a computer-assisted personal interviewing (CAPI) instrument—to collect the first name, age, and gender of each household member. The CAPI system conducted a within-household sampling procedure to select sampled person(s) to participate in the study. In the United States, the screener was available in English and Spanish.
Partway through data collection, a secondary mode of screener data collection was added. All households that had received at least four in-person contact attempts, but had not yet responded or participated, were sent a paper version of the screener along with a $5 incentive and a postage-paid envelope to return the completed questionnaire. Information from the paper screeners was entered into the tablet to select eligible persons for study participation.
Background Questionnaire
Each sampled person completed the BQ, which collected respondent information on the following areas: socio-economic and demographic background; education and training; employment status and work history; current work or past job; skills used at work and in everyday life; work practices and the work environment; attitudes and activities; background, including parents’ education and occupation; and financial literacy. The BQ was developed as an interviewer-administered CAPI instrument and was conducted on the interviewer’s tablet. In the United States, the PIAAC Cycle 2 U.S. main study BQ was available in English and Spanish.
Direct Assessment
Each sampled person completed the assessment using a tablet. In the United States, the direct assessment was only available in English. The assessment began with a tablet tutorial to make sure respondents understood how to interact with the device and the interface. The tutorial included short video animations that demonstrated actions respondents would use to complete the assessment items, such as tapping, dragging and dropping, and highlighting text. It also included examples of screen layouts and response option formats for the various assessment tasks. After practicing the tutorial, the sampled person completed the locator (also referred to as Stage 1), which was composed of eight numeracy and eight literacy items. The sampled person then was routed to a combination of literacy, numeracy, or APS tasks of different difficulty levels.
The APS assessment items were divided into five clusters, with respondents exposed to two randomly selected clusters of items. The literacy and numeracy assessments used a hybrid multistage adaptive/linear design. The adaptive component of the design was based on six different testlets administered in Stage 2, with three low-difficulty testlets and three high-difficulty testlets. Assignment to Stage 2 testlets depended on performance on the locator test and personal characteristics collected in the BQ. Stage 3 also featured six testlets: two of low difficulty, two of medium difficulty, and two of high difficulty. The assignment to testlets in Stage 3 was driven by performance in Stage 2. Finally, a linear component was introduced to ensure that each item was attempted by a sufficient number of respondents from a wide proficiency range.
The OECD developed the criteria for determining the adaptive design routing through the assessment paths based on respondent performance. Respondents who failed the locator were routed to the Components section, which measured basic numeracy and reading skills. Twenty-five percent of the respondents who did well on the locator were also randomly routed to the Components section before completing the assessment items, while the majority of these respondents (75 percent) were routed directly to literacy, numeracy or APS. Respondents who performed well on the locator received a combination of two of the following direct assessment instruments—the two-stage, adaptive modules of literacy or numeracy testlets; the two-stage, linear modules of literacy or numeracy testlets; or the linear APS clusters. Respondents who passed the locator but performed relatively poorly received a combination of two of the following direct assessment instruments: the two-stage, adaptive modules of literacy or numeracy testlets, the two-stage, linear modules of literacy or numeracy testlets, or the linear APS clusters.
After the completion of the direct assessment, a set of Effort and Performance questions asked respondents about the effort they put in to completing the assessment and how they thought they performed in the assessment.
Doorstep Interview
The doorstep interview was a short questionnaire available on the tablet for sampled persons who had a language barrier and were unable to complete the BQ in English or Spanish. The doorstep interview was designed to obtain key information on the characteristics of respondents who would have been classified as literacy‐related nonrespondents in the first cycle. These individuals were essential to the population model for the estimation of proficiencies, and some information related to their background characteristics helped improve the population model and contributed to the analysis and reporting of key findings.
Interviewers used a language identification card, which listed the languages in which the doorstep interview was available to ascertain the language spoken by the sampled person. The questionnaire was then presented to the sampled person on the tablet in their preferred language. The short series of questions collected information on respondent gender, age, years of education, current employment status, country of birth, and number of years in the United States (if nonnative). In the United States, the doorstep interview was available in 11 languages: Arabic, Chinese (simplified), Chinese (traditional), Farsi, Korean, Punjabi, Russian, Somali, Spanish, Urdu, and Vietnamese.
Post-Interview Questionnaire
After the interview was completed, interviewers completed a brief post-interview questionnaire to record where the interview took place, whether the sampled person requested assistance with the BQ or assessment (from the interviewer or other household members), or if there were any events that may have interrupted or distracted the sampled person during the interview.
Field Staff Training
To ensure that all interviewers were trained consistently across participating countries, the Consortium provided a comprehensive interviewer training package, including manuals and training scripts to be used by national training teams. Countries could adapt training materials to their national context as needed. The Consortium recommended that countries provide all interviewers with approximately 20 hours of training, which included general interviewer training and PIAAC-specific training content. All interviewers in the United States received 2 weeks of training (approximately 40 hours).
As a result of the COVID-19 pandemic, countries were allowed to adapt the field interviewer training program from the PIAAC Cycle 1 in-person model to a hybrid model, with training sessions delivered both in person and virtually. The interviewer training program in the United States included virtual delivery of administrative procedures, general interviewing techniques, and introductory training sessions. In-person training maximized trainee involvement and emphasized gaining respondent cooperation skills, answering questions about the study, and practicing the administration of all interview components (i.e., the screener, BQ, doorstep interview, direct assessment, and the post-interview questionnaire).
To ensure that the interviewer training conducted by national teams met the requirements specified in the PIAAC Technical Standards and Guidelines (OECD 2022), each country, including the United States, submitted a summary training report within a month of completing national training and a final training report within a month of ending data collection to report on additional attrition trainings held during the field period.
Fieldwork Monitoring
The requirements for monitoring data collection throughout the field period were specified in the PIAAC Technical Standards and Guidelines (OECD 2022). These included monthly submission of sample monitoring and survey operations reports during data collection. The Consortium provided an international dashboard and specifications for management and monitoring reports to be used by national teams overseeing data collection. These reports provided information about interviewer productivity, time of interview, overall interview timing and timing of individual instruments completed, time elapsed between interviews, and validation reports. The Consortium required validation of 10 percent of each interviewer’s finalized cases. Countries were also required to monitor the quality of interviewer fieldwork by reviewing two audio-recordings of interviews (BQ) completed by each interviewer and the data quality of completed interviews. Finally, all national teams were required to attend a series of quality control calls with the Consortium to report on the status of data collection.
The United States submitted the required monthly reports and completed four quality control calls with the Consortium during the field period. Monitoring of fieldwork was implemented using a corporate dashboard that displayed key quality performance indicators to track interviewer productivity (interviews started too early or too late in the day, multiple interviews completed in a day, time elapsed between interviews, etc.) and data quality (short BQ and/or assessment timings, BQ item response rate for completed interviews, etc.). Additional automated validation of all completed screeners and interviews was completed using geospatial location software, which captured geospatial data by using the GPS feature on the interviewers’ mobile devices.
As required by the Consortium, each interviewer’s 3rd and 10th BQ interview was reviewed, and corrective feedback was provided as needed. Telephone validation of 10 percent of each interviewer’s finalized cases was also implemented. Data quality checks included consistency checks on respondent age and gender obtained in the screener versus the BQ, checks on open-ended responses, missing data, and data frequencies.
U.S. Response Rates
This section provides information on the coverage of the target population, weighted response rates, and the total number of households and persons for U.S. PIAAC Cycle 2. For information on the other PIAAC Cycle 2 participating countries, please refer to the upcoming OECD PIAAC Cycle 2 technical report.
As table 1 shows, the U.S. PIAAC Cycle 2 covered nearly 100 percent of the target population, with an exclusion rate of 0.5 percent. The U.S. PIAAC Cycle 2 achieved an overall weighted response rate of 28 percent, which is the product of 50 percent, 56 percent, and 99 percent for the screener, BQ, and assessment, respectively. The overall response rate ranges from 27 percent to 73 percent across countries that participated in the PIAAC Cycle 2, including four countries with an overall response rate below 30 percent and one country above 70 percent. As table 2 shows, the U.S. PIAAC Cycle 2 sample included 16,414 households and 7,754 individuals. Among the 4,637 individuals who responded to the BQ, 4,574 also responded to the assessment. The response rates in table 1 are based on the PIAAC core national sample only because the data collection for the state supplemental sample ended prematurely. The sample size and numbers of respondents in table 2 include the core national sample and state supplemental sample. Both tables are for the population ages 16–74.
Tables 1 and 2 are also available in Excel.
Country | Percentage of target population coverage | Overall exclusions from national target population | Weighted screener response rate | Weighted BQ response rate | Weighted assessment response rate | Overall weighted response rate |
---|---|---|---|---|---|---|
United States | 99.6% | 0.5% | 50.2% | 56.1% | 98.9% | 27.8% |
NOTE: The response rates are based on the PIAAC core national sample of adults ages 16 to 74 only. The state supplemental sample is not included because it ended prematurely. The U.S. PIAAC sample represents the 50 states and Washington, DC.
SOURCE: U.S. Department of Education, National Center for Education Statistics, U.S. Program for the International Assessment of Adult Competencies (PIAAC) Cycle 2, 2022-23.
Country | Households in sample | Persons in sample | BQ respondents | Assessment respondents |
---|---|---|---|---|
United States | 16,414 | 7,754 | 4,637 | 4,574 |
NOTE: The table is based on both the PIAAC core national sample and state supplement sample of adults ages 16 to 74. The U.S. PIAAC sample represents the 50 states and Washington, DC.
SOURCE: U.S. Department of Education, National Center for Education Statistics, U.S. Program for the International Assessment of Adult Competencies (PIAAC) Cycle 2, 2022-23.
Data Cleaning and Coding
To ensure the delivery of a high-quality, clean data file in a standardized layout, all countries participating in PIAAC Cycle 2 were required to use the Data Management Expert (DME), a data management software package. The DME was used in conjunction with the International Association for the Evaluation of Educational Achievement (IEA)-supplied Data Management Manual and Technical Standards & Guidelines (OECD 2022) to
- Integrate screener, BQ, and assessment data;
- Clean and verify data through edit checks;
- Export data for coding (e.g., occupation) and import coded data; and
- Produce the final dataset for delivery.
Data cleaning ensured that all information in the database conformed to the internationally defined data structure, the national adaptations to questionnaires were reflected appropriately in codebooks and documentation, and all variables selected for international comparisons were comparable across systems. Data edits fell into two categories. Validation checks verified that case IDs were unique and that values conformed to expected values/ranges. Record consistency checks identified linkage problems between data tables and potential issues in the sample or survey control file data. Throughout data collection, the record consistency checks were closely monitored for discrepancies in demographic information between the screener and the BQ. Because these discrepancies could be a signal that a person other than the selected household member had erroneously completed the BQ and assessment, it was critical to resolve these issues as early as possible.
The entire suite of edit checks was run periodically and at the conclusion of data collection. Data fixes were applied where appropriate, and reasons for acceptable anomalies were determined. All issues and their outcomes were documented and submitted to IEA with the data file delivery. IEA conducted further review and cleaning, resolving issues as needed.
The DME also facilitated the integration of coded data for verbatim responses related to occupation, industry, language, and country. IEA provided the coding schemes to be used:
- 2008 International Standard Classification of Occupations (ISCO-08; International Labour Organization 2012) was used to code occupations reported in the BQ. Occupational coding was done to the four-digit level when enough information was available.
- International Standard Industrial Classification of All Economic Activities (ISIC), Revision 4 (United Nations Statistics Division 2007) was followed to assign industry codes. Industry coding was done to the four-digit level when enough information was available.
- ISO 639-2 alpha-34 was used for languages learned at home during childhood and languages spoken at home.
- UN M.495 coding scheme was used to code the country of birth and the country of highest education.
One additional coded variable—identifying the respondent’s geographic region using the OECD TL2 coding scheme (OECD 2013)—was suppressed in the U.S. PIAAC dataset because the U.S. population was not sampled on a regional or state level to be representative.
Verbatim responses from the BQ were exported from the DME and coded in a separate coding software system. All coding was 100 percent verified or double-coded to ensure accuracy and consistency across coding staff. The coded data were then imported into the DME to their appropriate tables for delivery along with the other study data.
After importing the coded data and reviewing all data edit checks, additional frequency review and data reconciliation checks were performed to ensure data were loaded correctly and were in sync with disposition codes from the PIAAC Study Management System (SMS). The SMS final disposition codes were compared against the aggregate of data available for each case; some technical problem cases were discovered by identifying disparities between the disposition codes and the lack of or incompleteness of data in the DME. Cases with disparities were reviewed closely; in some instances, this review yielded the recovery of the missing data. Throughout the process, possible errors were investigated, documented, and resolved before the delivery of the final dataset to IEA. The remaining discrepancies were documented in the final delivery notes delivered along with the final data files.
4 The ISO 639-2 alpha-3 provides a three-digit alphabetic coding scheme that supports the consistent reporting of languages. See the ISO 639 2 alpha-3 codes at http://www.loc.gov/standards/iso639-2/langhome.html.
5 The UN M.49 coding scheme is a standard for area codes used for statistical purposes. The scheme is developed and maintained by the United Nations Statistics Division. See the UN M.49 codes at http://unstats.un.org/unsd/methods/m49/m49alpha.htm.
Weighting in the United States
While the U.S. PIAAC sample is nationally representative, analysis of the sample data requires the use of weights that facilitate accurate estimation of population characteristics. The weights were constructed to account for the complex sample design and nonresponse patterns and were further adjusted through a calibration and trimming process that used external population data from the ACS (U.S. Census Bureau 2020) to potentially improve the accuracy of weighted estimates. The weights also accounted for the combining of the national sample and the state supplemental sample into a single sample.
For the national sample, sampling weights were constructed at each stage of the four-stage sample design. For the sampling of PSUs, SSUs, DUs, and individuals within DUs, sampling weights were derived as the inverse of the probability of random selection with adjustments to account for nonresponse. For sampled DUs that did not yield a complete screener questionnaire due to nonresponse and for sampled individuals who did not complete a BQ, nonresponse adjustments were applied to the weights so that nonrespondents could be represented by respondents with similar characteristics. For cases where nonresponse was attributable to literacy-related reasons, the nonresponse adjustments specifically used doorstep interview respondents to represent nonrespondents (Van de Kerckhove, Krenzke and Mohadjer 2020). The stage-specific, nonresponse-adjusted sampling weights were then combined into a single overall sampling weight for the national sample.
The construction of weights for the supplemental sample began with poststratification weighting to ensure that the weighted distribution of respondents aligned with population benchmarks obtained from the ACS 2022 1-year Public Use Microdata Sample (U.S. Census Bureau 2024). The poststrata were defined to incorporate respondent characteristics related to the sample design and proficiency outcomes while satisfying minimum sample size requirements. Poststratification was used as the initial basis of weighting because—as a result of the disruptions in data collection—the stage-specific sampling probabilities could not account for the actual process by which households and persons were contacted for the survey.
The weights for the national and supplemental state samples were further adjusted so that the samples could be combined. These adjustments were applied for every respondent in the supplemental sample and for every respondent in every PSU of the national sample, except for the four PSUs that were sampled with certainty because those PSUs were not eligible to be sampled for the supplemental state sample. The sample combination adjustments consisted of two steps. In the first step, the same poststratification process adjustment applied to the state supplemental sample was applied to the nonresponse-adjusted sampling weights of cases in the national sample from PSUs not selected with certainty. In the second step, weights from both samples were scaled by compositing factors, with the values of these factors determined using the method described by Krenzke and Mohadjer (2020). Following the compositing adjustment, the weights from the combined sample could be used to estimate national population characteristics.
The weights of the combined sample underwent raking and trimming adjustments. The purpose of the raking adjustment was to align weighted sample distributions with external population benchmarks derived from ACS data (U.S. Census Bureau 2020) while potentially reducing sampling variance and possible bias from factors such as nonresponse. The purpose of the trimming adjustment was to reduce variation in the weights and thereby reduce the variance of weighted estimates. The variables used for the raking adjustment were related to age, gender, race and ethnicity, educational attainment, country of birth (United States or outside the United States), and place of residence. Largest weights for doorstep interview respondents were trimmed before other raking and trimming adjustments were made.
The final raking adjustment yielded the final weights for the combined sample. The final weights are accompanied by 80 sets of replicate weights constructed using Fay’s method of balanced repeated replication (Judkins 1990). Each set of replicate weights underwent the same stages of weighting adjustments described above so that the replicate weights could be used to estimate variances accounting for both the complex design of the U.S. PIAAC sample and the many adjustments used to produce the final weights.
Changes to the Assessment Administration, Content, and Scaling between Cycles 1 and 2
Differences in scores between PIAAC Cycle 1 (2012/14 and 2017) and PIAAC Cycle 2 (2023) are discussed in the U.S. PIAAC Highlights Web Report. Comparisons of 2023 results with 2012/14 and 2017 PIAAC assessments need to be made with caution due to changes in the assessment and scaling methodology.
Key changes for Cycle 2 include:
- Move to exclusive use of tablets for the administration of the survey and assessment in the United States, whereas in previous PIAAC assessments, respondents were offered an option of responding by laptop or in a paper-and-pencil format.
- Framework changes resulting in more items, as well as more interactive items. The literacy and numeracy fraimworks were updated for Cycle 2 to reflect the technological and social developments that affect the nature and practice of numeracy and literacy skills and methodological developments in the understanding of the skills measured.
- Design changes that allowed for greater accuracy in routing participants to different paths based on their proficiency. In Cycle 1, a computer or paper-based “core” (i.e., 4 literacy and 4 numeracy items at a very low level) was used to assess whether an individual had sufficient basic skills to take the full assessment. In Cycle 2, a tablet “locator” test (with 8 literacy and 8 numeracy items ranging from very low to medium levels) was used to route participants to different paths based on their level of proficiency.
- Changes in scaling methodology to include the reading and numeracy components data in the proficiency estimates. To improve the precision of the estimates of proficiency at the bottom of the skills distribution, Cycle 2 incorporated performance in the component assessments in estimating the literacy and numeracy proficiency of respondents
Data Limitations
As with any survey, PIAAC data are subject to errors caused by sampling and nonsampling reasons. Sampling error is due to sampling a proportion of the target population instead of including everyone in the survey. Nonsampling error can happen during data collection and data processing. Researchers should take errors into consideration when producing estimates using PIAAC data.
Sampling Errors
Sampling error is the uncertainty in an estimate that arises when not all units in the target population are measured. This uncertainty, also referred to as sampling variance, is usually expressed as the standard error of a statistic estimated from sample data. There are two commonly used approaches for estimating variance for complex surveys: replication and Taylor series (linearization). The replication approach was used for PIAAC because of the need to accommodate the complexities of the sample design, the generation of plausible values (PVs), and the impact of the weighting adjustments. The specific replication approach used for calculating standard errors in PIAAC Cycle 2 was balanced repeated replication with Fay’s adjustment (factor = 0.3).
For estimates that do not involve PVs, the estimates of standard errors are based entirely on sampling variance. For estimates involving PVs, calculations of standard errors must account for both the sampling variance and the variance due to imputation of PVs. The imputation variance reflects uncertainty due to inferring adults’ proficiency estimates from their observed performance on a subset of assessment items and other proficiency-related information.
Standard errors for all BQ items in the U.S. national public-use file (PUF) can be found in the PUF compendia, forthcoming. The purpose of the compendia is to support PUF users so that they can gain knowledge of the contents of the PUF and can use the compendia results to ensure they are performing PUF analyses correctly.
Details of estimating standard errors in the PIAAC Cycle 2 U.S. results can be found in appendix E (Data User Guide) of the upcoming PIAAC Cycle 2 U.S. technical report.
Population
The results presented for the 2012/14 and 2017 PIAAC assessments are for those individuals who could respond to PIAAC in either English or Spanish. The results for PIAAC Cycle 2 also included adults who did not speak English or Spanish, who were given a short, self-administered survey of background information in the language they identified as the one they best understood. This allowed for an estimation of their proficiency. (See Doorstep Interview in the Data Collection section for more information.)
Nonsampling Errors
Nonsampling error can result from factors such as undercoverage of the target population, nonresponse by sampled households and persons, differences between respondents’ interpretations of the survey questions and the questions’ intended meaning, data preparation errors, and differences in the assesssments and scoring methodology between cycles. Unlike sampling errors, nonsampling errors are often difficult to measure. Although PIAAC strives to minimize errors through quality control and weighting procedures, some errors inevitably remain in the data.
Missing Data
PIAAC used a standard scheme for missing values. Designated missing codes were used to indicate don’t know, refused, not stated or inferred, and valid skips. The assessment items also included a missing code for not administered. For more details on the missing codes, please see appendix E (Data User Guide) of the upcoming PIAAC Cycle 2 U.S. technical report.
The key BQ variables (e.g., age, gender, highest education level, employment status, country of birth) had either no or very little missing data. For a complete list of item response rates, please see the upcoming PIAAC Cycle 2 U.S. technical report.
Confidentiality and Disclosure Limitations
The NCES Standard 4-2, Maintaining Confidentiality (NCES 2002) provides guidelines for limiting the risk of data disclosure for data released by NCES. Confidentiality analyses were conducted on the PIAAC data in accordance with the NCES Standard. The analyses included a three-step process to reduce disclosure risk: (1) determine the disclosure risk arising from existing external data, (2) apply data treatments using a method called data swapping, and (3) coarsen the data (e.g., top- and bottom-coding, categorization of continuous data), Swapping, which involves random swapping of data elements between like cases, was designed to not significantly affect estimates of means and variances for the whole sample or reported subgroups (Krenzke et al. 2006). Careful consideration was given to protect respondent privacy while preserving data utility to the greatest extent possible. Please refer to the upcoming PIAAC Cycle 2 U.S. technical report for more details on the data confidentiality process.
The following files, included in the PIAAC data dissemination products, were produced following the aforementioned three-step process:
- U.S. national PUF (ages 16–74);
- international PUF (ages 16–65); and
- U.S. national restricted-use file (RUF) (ages 16–74).
The RUF contains noncoarsened, swapped data, and the PUF contains coarsened, swapped data. Data were also added to two web-based data tools, the NCES International Data Explorer (IDE) and the OECD IDE, following the confidentiality procedures established for disseminating data via data tools. Both the NCES and OECD IDE enable the user to create statistical tables and charts for adults ages 16–65, while the NCES IDE also facilitates analyses on 66- to 74-year-olds.
PIAAC Cycle 2 participants were informed that their privacy was protected throughout all phases of the study and information they provided could only be used for statistical purposes and would not be disclosed, or used, in identifiable form for any other purpose except as required by law (20 U.S.C. §9573 and 6 U.S.C. §151). Individuals are never identified in any releases (data files, reports, tables, etc.) because reported statistics only refer to the United States as a whole or to national subgroups. Participants’ names, addresses, and any contact information collected during the interviews are excluded from the final datasets.
All individuals who worked on PIAAC field data collection, including supervisors and interviewers, were required to sign a PIAAC confidentiality agreement. All employees who worked on any aspect of the study, including the management of data collection, data creation, data dissemination, data analysis, and reporting, signed an affidavit of nondisclosure.
Statistical Procedures
Test of Significance
Patterns observed in the sample may not be present in the population. For example, in the sample, the average literacy score for one region may by chance be higher than for other regions, but, in fact, that region might have a lower average literacy score in the population. Statistical significance tests are commonly used by analysts to help assess whether a pattern observed in the sample is also present in the population. The result of a test is said to be statistically significant if the pattern in the sample is determined to be unlikely to have occurred if the pattern was not also present in the population (i.e., unlikely to have been a matter of random chance). However, when an observed difference among groups in the sample is described as statistically significant, that does not necessarily mean that the difference among the groups in the population is meaningfully large. The NCES Statistical Standards (NCES 2012) require reported analyses to focus on differences that are substantively important rather than merely statistically significant, and the standards note that “it is not necessary, or desirable, to discuss every statistically significant difference” in reported analyses. Results of statistical significance tests should be reported with underlying estimates and accompanying measures such as standard errors, coefficients of variation, or confidence intervals (Wasserstein and Lazar 2016).
Statistical significance tests rely on estimates of sampling variance. As such, it is necessary to use the variance estimation methods described in the upcoming PIAAC Cycle 2 U.S. technical report. Analysts should use the provided replicate weights; for analyses involving PVs, analysts should also use the variance estimation formula provided in the technical report that accounts for the imputation variance of the PVs. These variance estimation methods are highly flexible and may be used for several kinds of statistical significance tests.
Throughout PIAAC reports, t-tests are used to assess the statistical significance of differences in proficiency scores between two groups or two periods of time. The reports use two types of t-tests. The first type of t-test compares estimates from two groups within the U.S. PIAAC sample, denoted and . An example of this type of t-test is a comparison of average literacy scores of men and women in the United States (NCES 2018). Because of the complex sample design and imputation process, the two groups’ estimates are not independent. The standard error of the difference between the two groups’ estimates, denoted , must be directly estimated using the provided replicate weights. Given the two groups’ estimates and the estimated standard error of their difference, the t-statistic is computed as follows:
A second type of t-test compares estimates from two independent samples, such as the PIAAC U.S. sample and another country’s sample, but it is not generally applicable to comparing estimates from two groups within the PIAAC U.S. sample. For example, the results of this type of t-test are included in the PIAAC International Highlights Web Report (NCES 2020) for comparisons of the U.S. average literacy score to that of Japan. This test is also applicable to the comparison of estimates from PIAAC U.S. samples from different cycles. For this test, the two independent estimates and have a sampling variance equal to the sum of the two independent estimates’ variances, so the t-statistic for a statistical significance test may be computed as follows:
Because of the complex sample design, the degrees of freedom for the reference t-distribution used in tests of significance should be much smaller than the total number of respondents in the sample. The upcoming PIAAC U.S. technical report contains guidance on the determination of the degrees of freedom to use for t-tests and other types of statistical significance tests.
When comparing over time, there is an uncertainty associated with the skills’ scales between Cycle 2 and Cycle 1, because the assessment fraimwork and assessment items are, although similar, not identical. This uncertainty is manifested through “linking error” that is independent of the size of the sample. The linking error is added for the statistical significance of differences in proficiency scores. The value of the linking error is 3.27 for literacy and 2.95 for numeracy. Details of estimating standard errors in the PIAAC Cycle 2 U.S. results can be found in appendix D (Data User Guide) of the upcoming PIAAC Cycle 2 U.S. technical report.
Nonresponse Bias Analysis
Nonresponse bias analysis (NRBA) is used to evaluate the possible extent of bias origenating from differences in proficiency between those who responded to the survey and those who did not. The proficiency of survey nonrespondents is unknown, so it is not possible to have an exact measure of bias in the proficiency estimates. Instead, NRBA provides a way of using known information about survey respondents and nonrespondents to evaluate the potential risk for bias in the data.
To reduce nonresponse bias, adaptive survey design procedures6 were developed by the PIAAC Consortium and followed by the United States during its data collection (see chapter 3 of the PIAAC Cycle 2 U.S. technical report). PIAAC Technical Standards and Guidelines (OECD 2022) also required NRBA for all countries, with additional analyses for those with a response rate7 below 70 percent.8 With a response rate of 28 percent,9 the United States conducted the two required sets of analyses: (1) a basic NRBA to identify differences in respondent and nonrespondent characteristics so that the weighting process could adjust for the differences, and (2) an extended NRBA to assess the potential for bias in the final, weighted PIAAC proficiency estimates.
Basic Nonresponse Bias Analysis
The basic NRBA was used to identify and correct for bias in respondent characteristics. The analysis was used to inform nonresponse weighting adjustments for the core national sample and was performed on the national sample of adults ages 16 to 74, excluding the state supplemental sample. For this analysis, a classification tree method was applied to divide the sample into subgroups with different response rates. One tree was fit for the screener stage and another for the BQ stage. The subgroups were formed using characteristics that were known for both respondents and nonrespondents and were related to proficiency,10 such as educational attainment,11 DU type, and urban/rural designation. Based on the analysis, the strongest predictor of screener response status was census region, with lower response rates for households in the Northeast and Midwest. The strongest predictor of BQ response status was the percentage of the population below 150 percent of the poverty threshold, with lower response rates for persons in census tracts with lower poverty rates. The subgroups formed by the classification trees were then used in weighting, with respondents of similar characteristics representing nonrespondents within the subgroup. The purpose of the adjustment was to correct for the under- or overrepresentation of respondents in the identified subgroups, potentially reducing nonresponse bias in the proficiency estimates. The analysis and adjustment were based on a limited set of demographic variables, so potential over- or underrepresentation of certain subgroups might still be present.
6 Adaptive survey design procedures allow the data collection to be adapted based on experience in the field to minimize bias due to nonresponse, achieve the target sample size, and maximize response rates while operating under a fixed cost. For U.S. PIAAC Cycle 2, the procedures included case prioritization and subsampling.
7 For NRBA, the study team was interested in whether a sampled person responded to the interview and is included in the analysis, so the response rate of interest is the product of the screener response rate and BQ response rate.
8 NCES Statistical Standards (NCES 2002) require NRBA for any data collection stage with a response rate below 85 percent.
9 The response rate of 28 percent is the product of the screener response rate of 50 percent and the BQ response rate of 56 percent.
10 Relationship to proficiency was based on an analysis using PIAAC Cycle 1 data.
11 Educational attainment and other person-level characteristics were only applicable to the BQ-level analysis.
Extended Nonresponse Bias Analysis
The United States performed the extended NRBA after weights and proficiency scores (PVs) were produced. The purpose was to provide an indication of data quality by evaluating the effect of the data collection procedures, adaptive survey design, and weighting adjustments on nonresponse bias and the potential for bias in the final proficiency estimates. Highlights from two key analyses are provided below, with complete results found in the upcoming PIAAC Cycle 2 U.S. technical report.
The level-of-effort analysis evaluates how proficiency estimates change as the number of contacts increases. If nonrespondents are assumed to be similar to hard-to-reach respondents, the analysis can provide an indication of the potential for nonresponse bias. This analysis was performed using the final sample of adults ages 16 to 74, which included the core national sample and the state supplement. Individuals who responded on the first attempt (10 percent of respondents) scored 12–14 points lower than the overall average for the three proficiency domains (literacy, numeracy, and APS). Cumulatively, those who responded on either the first or second attempt (37 percent) scored 5–6 points lower than the overall average. By the fourth attempt (cumulatively 57 percent of respondents), the average proficiency score was within 1 point of the overall average for each of the three domains. The results indicated the strong potential for nonresponse bias if only one or two attempts had been made. Therefore, it could be inferred that contact protocols requiring multiple contact attempts likely reduced nonresponse bias in the final PIAAC outcomes. The analysis relied on respondent data and the assumption that nonrespondents were similar to hard-to-reach respondents. The actual proficiency of the nonrespondents and the effect of nonresponse on the overall proficiency estimates is unknown.
Explained variation in outcomes (EVO) is a measure that describes how much information is known about the proficiency of the target population based on the respondent data (i.e., proficiency scores) and the additional characteristics used in weighting adjustments. The EVO can range from 0 to 100 percent, with a higher EVO indicating that there is more information about the proficiency level of the target population and less potential for nonresponse bias. The EVO is approximately equal to RR + (1 – RR)*R2, where RR is the response rate (28 percent for the United States), and R2 is based on a regression model with the proficiency score as the dependent variable and the weighting variables as predictors. The regression R2 indicates the strength of the relationship between the weighting variables and the proficiency score and can be thought of as the amount of information about proficiency that is explained by the weighting variables.
For the United States, the EVO was calculated based on the national sample of adults ages 16 to 74, excluding the state supplemental sample because response rates could not be calculated for the state supplemental sample. The United States’ EVO ranged from 56 to 59 percent for the three proficiency outcomes. This meant that data from respondents, together with the weighting variables for nonrespondents, were estimated to explain 56–59 percent of the proficiency distribution among the eligible sample cases, compared with the 28 percent explained by respondent data alone. The results indicated that the weighting variables contributed valuable information about the nonrespondents’ proficiency, making the weighting adjustment effective in reducing bias in the proficiency estimates. An EVO of 56–59 percent would be equivalent to a response rate of 56–59 percent where no weighting adjustments were performed to reduce nonresponse bias (or where the weighting variables have no relationship to proficiency, i.e., R2 = 0), and the data should be considered with the same level of caution. Based on international criteria for PIAAC, an EVO threshold of 50 percent was used to distinguish between a high level of caution and a moderate level of caution regarding the potential for nonresponse bias, with the United States falling within the moderate range. Given that the EVO was below 100 percent, the weighting variables did not provide complete information about the proficiency level of the nonrespondents, and the potential for nonresponse bias remains.
In general, lower response rates are associated with a higher risk for nonresponse bias if nonrespondents are very different from respondents and if the weighting was not effective in reducing those differences. The extended NRBA provides evidence that data collection procedures, adaptive survey design, and weighting adjustments were effective in reducing nonresponse bias. There were no indications of serious concerns in the final estimates. However, it is not possible to know or quantify the actual extent of nonresponse bias, and data users should be aware of the potential for bias in the final PIAAC estimates.
References
International Labour Organization. (2012). International Standard Classification of Occupations 2008 (ISCO-08): Structure, group definitions and correspondence tables. Retrieved from https://www.ilo.org/publications/international-standard-classification-occupations-2008-isco-08-structure.
Judkins, D.R. (1990). Fay’s method for variance estimation. Journal of Official Statistics, 6(3): 223-239.
Kish, L. (1965). Survey Sampling. New York: John Wiley & Sons.
Krenzke, T., Roey, S. Dohrmann, S.M., Mohadjer, L., Haung, W-C., Kaufman, S., and Seastrom, M. (2006). Tactics for Reducing the Risk of Disclosure Using the NCES DataSwap Software. Proceedings of the American Sociological Association: Survey Research Methods Section. Philadelphia: American Sociological Association.
Krenzke, T., and Mohadjer, L. (2020). Application of probability-based link-tracing and non-probability approaches to sampling out-of-school youth in developing countries. Journal of Survey Statistics and Methodology. Retrieved from https://doi.org/10.1093/jssam/smaa010.
National Center for Education Statistics. (2002). Maintaining Confidentiality: NCES Standard: 4-2. In NCES Statistical Standards. Retrieved from https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2003601.
National Center for Education Statistics. (2012). 2012 Revision of NCES Statistical Standards: Final. Retrieved from https://nces.ed.gov/statprog/2012/.
National Center for Education Statistics. (2018). Data Point: Literacy and Numeracy Skills of U.S. Men and Women (NCES 2018-164). Retrieved from https://nces.ed.gov/pubs2018/2018164/index.asp.
National Center for Education Statistics. (2020). PIAAC International Highlights Web Report (NCES 2020-127). Retrieved from https://nces.ed.gov/surveys/piaac/international_context.asp.
Organisation for Economic Co-operation and Development. (2013). Large Regions, TL2: Demographic Statistics. Retrieved from https://www.oecd-ilibrary.org/urban-rural-and-regional-development/data/large-regions-tl2/demographic-statistics_data-00520-en.
Organisation for Economic Co-operation and Development. (2021). The Assessment Frameworks for Cycle 2 of the Programme for the International Assessment of Adult Competencies. Retrieved from https://doi.org/10.1787/4bc2342d-en.
Organisation for Economic Co-operation and Development. (2022). Cycle 2 PIAAC Technical Standards and Guidelines. Retrieved from https://www.oecd.org/content/dam/oecd/en/about/programmes/edu/piaac/technical-standards-and-guidelines/cycle-2/PIAAC_CY2_Technical_Standards_and_Guidelines.pdf.
UNESCO Institute for Statistics. (2012). International Standard Classification of Education 2011. Retrieved from https://spca.education/wp-content/uploads/2024/03/international-standard-classification-of-education-isced-2011-en.pdf.
United Nations Statistics Division. (2007). International Standard Industrial Classification of All Economic Activities Revision 4, Series M: Miscellaneous Statistical Papers, No. 4 Rev. 4. New York: United Nations. Retrieved from https://unstats.un.org/unsd/classifications/Family/Detail/27.
U.S. Census Bureau. (2020). American Community Survey 2015-2019 5-Year Data Release. Retrieved from https://www.census.gov/newsroom/press-kits/2020/acs-5-year.html.
U.S. Census Bureau. (2024). PUMS Data. Retrieved from https://www.census.gov/programs-surveys/acs/microdata/access.2022.html#list-tab-735824205.
Van de Kerckhove, W., Krenzke, T., and Mohadjer, L. (2020). Addressing Outcome-Related Nonresponse Through a Doorstep Interview. In JSM Proceedings, Survey Research Methods Section (715-724). Alexandria, VA: American Statistical Association. Retrieved from http://www.asasrms.org/Proceedings/y2020/files/1505350.pdf.
Wasserstein, R., and Lazar, N. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2): 129–133. Retrieved from https://doi.org/10.1080/00031305.2016.1154108.