How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

Graham, John W.; Olchowski, Allison E.; Gilreath, Tamika D.

doi:10.1007/s11121-007-0070-9

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

Published: 05 June 2007

Volume 8, pages 206–213, (2007)
Cite this article

Prevention Science Aims and scope Submit manuscript

John W. Graham¹,
Allison E. Olchowski¹ &
Tamika D. Gilreath¹

16k Accesses
24 Altmetric
3 Mentions
Explore all metrics

Abstract

Multiple imputation (MI) and full information maximum likelihood (FIML) are the two most common approaches to missing data analysis. In theory, MI and FIML are equivalent when identical models are tested using the same variables, and when m, the number of imputations performed with MI, approaches infinity. However, it is important to know how many imputations are necessary before MI and FIML are sufficiently equivalent in ways that are important to prevention scientists. MI theory suggests that small values of m, even on the order of three to five imputations, yield excellent results. Previous guidelines for sufficient m are based on relative efficiency, which involves the fraction of missing information (γ) for the parameter being estimated, and m. In the present study, we used a Monte Carlo simulation to test MI models across several scenarios in which γ and m were varied. Standard errors and p-values for the regression coefficient of interest varied as a function of m, but not at the same rate as relative efficiency. Most importantly, statistical power for small effect sizes diminished as m became smaller, and the rate of this power falloff was much greater than predicted by changes in relative efficiency. Based our findings, we recommend that researchers using MI should perform many more imputations than previously considered sufficient. These recommendations are based on γ, and take into consideration one’s tolerance for a preventable power falloff (compared to FIML) due to using too few imputations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation

Article Open access 18 July 2022

Missing Data Imputation: A Practical Guide

Optimal imputation of the missing data using multi auxiliary information

Article 18 July 2020

References

Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic.
Google Scholar
Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330–351.
Article PubMed CAS Google Scholar
Graham, J. W. (2003). Adding missing-data relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10, 80–100.
Article Google Scholar
Graham, J. W., Cumsille, P. E., & Elek-Fisk, E. (2003). Methods for handling missing data. In: J. A. Schinka & W. F. Velicer (Eds.), Research methods in psychology (pp. 87–114). Volume 2 of Handbook of Psychology (I. B. Weiner, Editor-in-Chief). New York: Wiley.
King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: an alternative algorithm for multiple imputation. American Political Science Review, 95, 49–69.
Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman and Hall.
Google Scholar
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.
Article PubMed Google Scholar
Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing data problems: A data analyst’s perspective. Multivariate Behavioral Research, 33, 545–571.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biobehavioral Health, Penn State University, E-315 Health & Human Development Bldg., University Park, PA, 16802, USA
John W. Graham, Allison E. Olchowski & Tamika D. Gilreath

Authors

John W. Graham
View author publications
You can also search for this author in PubMed Google Scholar
Allison E. Olchowski
View author publications
You can also search for this author in PubMed Google Scholar
Tamika D. Gilreath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John W. Graham.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Graham, J.W., Olchowski, A.E. & Gilreath, T.D. How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory. Prev Sci 8, 206–213 (2007). https://doi.org/10.1007/s11121-007-0070-9

Download citation

Received: 22 December 2006
Accepted: 08 May 2007
Published: 05 June 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s11121-007-0070-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation

Missing Data Imputation: A Practical Guide

Optimal imputation of the missing data using multi auxiliary information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The effect of high prevalence of missing data on estimation of the coefficients of a logistic regression model when using multiple imputation

Missing Data Imputation: A Practical Guide

Optimal imputation of the missing data using multi auxiliary information

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!