Blimp 3 User Manual
Blimp 3 User Manual
Blimp 3 User Manual
Updated 7.22.2022
Cite as: Keller, B. T., & Enders, C. K. (2021). Blimp user’s guide (Version 3). Retrieved
from www.appliedmissingdata.com/multilevel-imputation.html
DISCLAIMER: This is free software with no expressed license given. There is no right to
distribute the software or documentation without written consent. This is for your sole
use only, given as is.
Blimp User’s Guide (Version 3) 3
Table of Contents
Table of Contents 3
1 Introduction 9
2.1 Overview 27
DATA Command 30
VARIABLE Command 30
ORDINAL Command 31
NOMINAL Command 32
COUNT Command 32
FIXED Command 33
CLUSTERID Command 33
MISSING Command 34
TRANSFORM Command 34
Blimp User’s Guide (Version 3) 4
BYGROUP Command 35
LATENT Command 36
RANDOMEFFECT Command 37
MODEL Command 38
Regression Models 38
Discrete Outcomes 43
Discrete Predictors 43
Parameter Constraints 50
Auxiliary Variables 51
Latent Variables 51
CENTERING Command 61
SIMPLE Command 63
PARAMETERS Command 66
TEST Command 69
FCS Command 72
BURN Command 76
ITERATIONS Command 76
CHAINS Command 77
Blimp User’s Guide (Version 3) 5
NIMPS Command 77
THIN Command 78
OPTIONS Command 79
OUTPUT Command 80
SAVE Command 81
5.8: Imputing Latent Response Scores for Item-Level Factor Analysis 146
5.9: Factor Analysis With Skewed Indicators and Yeo-Johnson Transform 150
6.4: Alternate Prior Distributions for Random Effect Covariance Matrix 174
6.7: Two-Level Model With Random Effects Predicting a Level-2 Outcome 186
8 References 226
Blimp User’s Guide (Version 3) 9
1 Introduction
Blimp is an all-purpose data analysis and latent variable modeling program that
requires minimal scripting and no deep-level knowledge about Bayes. The application,
which is available for macOS, Windows, and Linux, was developed with funding from
fully conditional specification (Enders, Keller, & Levy, 2018), and its second release
transitioned the software to a full-featured multilevel analysis package (Enders, Du, &
Keller, 2020). Blimp 3 introduces wide ranging and powerful capabilities for
multivariate analyses with latent variables (e.g., path models, measurement models,
structural equation models), including many models not available in other software
packages.
The development team’s philosophy for Blimp is to bring easy Bayes estimation to the
masses; the program offers some opportunities for “getting under the hood”, but
algorithmic tweaks and nuanced model specifications are not as customizable as they
are in specialized (but less user-friendly) programs such as Stan or JAGS. To this end,
our overarching philosophy, complex models can be specified with minimal coding by
simply listing variable names in a format that resembles a regression equation (e.g., y
Blimp User’s Guide (Version 3) 10
software also adds any supporting models needed for missing data handling.
Blimp’s primary purpose is to provide researchers with a powerful tool for analyzing
experience with the flexibility to estimate complex latent variable models, many of
which are not available in other software packages. Models can include up to three
continuous variables, and count variables. Chapters 4 through 7 of this guide provides
numerous examples. Separate from its data analytic core, Blimp continues to offer the
of fully conditional specification parallels van Buuren’s popular MICE program (van
Buuren & Groothuis‐Oudshoorn, 2011), but it uses latent response variable framework
to treat categorical variables and uses latent group means to preserve multilevel data
structures. Enders (2022) describes fully conditional specification with latent variables,
One of the major features in Blimp 3 is a redesigned graphical user interface called
Blimp Studio. The Studio application features a tabbed interface that makes it easy to
work with multiple scripts and projects at the same time. The graphic below shows the
Clicking the blue arrow button on the toolbar executes a script and spawns a paned
interface that adds an output window containing the analysis results and a plotting
window displaying trace (time series) plots for every model parameter.
Blimp User’s Guide (Version 3) 12
Clicking on the normal distribution icon in the toolbar hides the plotting window, which
can also be disabled completely in the application’s Preferences, located under the
Blimp Studio > Preferences pull-down menu. Other visual settings such as fonts and
the orientation of the paned windows can also be set in the Preferences pane, shown
below.
The Blimp output includes regression model parameters for each variable, including
standardized estimates and variance explained effect sizes (Rights & Sterba, 2019).
Certain types of analyses spawn additional output (e.g., odds ratios in a logistic
the posterior median and standard deviation as a point estimate and measure of
uncertainty, respectively, and it provides 95% credible interval limits. Other summary
statistics such as posterior mean and median absolute deviation are also available (see
Blimp User’s Guide (Version 3) 13
OUTPUT command). The graphic below shows a typical tabular display of the analysis
results with a vertical split between the scripting and output window.
Blimp automatically saves the output file to the same directory as the analysis script.
Blimp output is saved in a text file with a .blimp-out extension. The outputs are
linked to their analysis scripts, such that double-clicking on one of the files opens both
The major feature that distinguishes Blimp’s estimation architecture from other latent
variable modeling software packages is that it does not work with the joint distribution
of the analysis variables. Rather, the multivariate distribution is factored into the
Y, X, and M. The trivariate distribution factors into the product of three univariate
Blimp estimates the models on the right of the equals sign without assuming anything
about the form or shape of the multivariate distribution on the left. The advantage of
this specification is that the individual regression equations can feature mixtures of
interactive or nonlinear terms, and other complex constructions. In such cases, the
multivariate distribution on the left doesn’t have a known or simple form, and model
misspecifications (e.g., treating such data as multivariate normal) can introduce bias.
The theory for Blimp’s model specification is given by Ibrahim and colleagues (Ibrahim,
Chen, & Lipsitz, 2002; Ibrahim, Lipsitz, & Chen, 1999; Lipsitz & Ibrahim, 1996), and the
software extends these ideas to latent variable models with up to three levels. More
recent literature refers to this model specification as fully Bayesian estimation, the
sequential specification, and factored regression (Enders et al., 2020; Erler, Rizopoulos,
Jaddoe, Franco, & Lesaffre, 2019; Erler et al., 2016; Lüdtke, Robitzsch, & West, 2020a,
where the focal analysis model is a linear regression of Y on X and M. The factorization
above translates into the following linear regression models, where all residuals are
The X and M equations are essentially nuisance models in this example, and their role
Any univariate equation can feature mixtures of categorical and normal variables,
among other things. For example, the following equations include an interaction
between X and M in the focal model and a quadratic association between X and M in
would produce biased estimates because the incomplete predictor distributions are
Seaman, White, & Carpenter, 2015; Liu, Gelman, Hill, Su, & Kropko, 2014). Specifying a
distribution altogether. These ideas readily extend to latent variables, which Blimp
Throughout the guide, we use the term “predictors” to refer to exogenous variables—in
a path diagram, variables that do not have incoming arrows. When predictors are
instead, the covariate data essentially function as known constants, as in ordinary least
Blimp User’s Guide (Version 3) 16
imputation. Blimp allows these distributions to be many different things (e.g., normal,
that covariate a dependent variable in its own regression model. These supporting
models can be explicitly specified, or Blimp can create them automatically. These two
strategies are somewhat different and have different strengths and weaknesses. We
use the following multiple regression to illustrate the two model specification
MODEL:
y ~ x1 x2 x3;
associations among the predictors. The underlying regression models follow a round
The regressions above are linear and assume normally distributed residuals, but this
specification also allows for binary, ordinal, and multicategorical nominal predictors, in
which case Blimp adopts a latent response variable formulation (Albert & Chib, 1993;
Blimp User’s Guide (Version 3) 17
Carpenter & Kenward, 2013; Enders et al., 2018; Johnson & Albert, 1999). Variable
metrics are specified using the ORDINAL and NOMINAL commands described in
Chapter 2. Enders et al. (2020) describe the multilevel version of this specification.
More formally, adopting unspecified associations for the predictors invokes a model
that factors the joint distribution into the product of a univariate distribution for the
rightmost term—a multivariate normal distribution for continuous predictors and latent
of fixed constants, variances and covariances, and correlations. The round robin
(Arnold, Castillo, & Sarabia, 2001; Liu, Gelman, Hill, Su, & Kropko, 2014). Importantly,
among the predictors, as such relations are incompatible with normal data.
Next, consider the situation where the user explicitly specifies the regression equations
for the predictors. This specification leverages the probability chain rule to factorize the
joint distribution of the analysis variables into the product of several univariate
The corresponding regression equations follow the same cascading pattern where the
first predictor’s model is empty, the second predictor is regressed on the first, the third
MODEL:
# predictor models
x3 ~ 1;
x2 ~ x3;
x1 ~ x2 x3;
# focal model
y ~ x1 x2 x3;
and the code block below illustrates a syntax shortcut for this specification that lists all
MODEL:
# predictor models
x1 x2 x3 ~ 1;
# focal model
y ~ x1 x2 x3;
The sequential specification for predictors differs in important ways. First, the
continuous, binary (probit or logit link), ordinal (probit link), multicategorical nominal
(logistic link), or count (negative binomial link). Second, associations among the
predictors need not be linear. For example, the following equations include the
MODEL:
# predictor models
x3 ~ 1;
x2 ~ x3 (x3^2);
x1 ~ x2 x3;
# focal model
y ~ x1 x2 x3;
When using a sequential specification, ordering the predictors in a particular way can
facilitate estimation and reduce the impact of model misspecifications. Lüdtke et al.
(2020b, pp. 171-172) recommend ordering variables from left to right by their
ORDINAL: x1;
MODEL:
# predictor models
x2 ~ 1;
x1 ~ x2;
x3 ~ x1 x2;
# focal model
y ~ x1 x2 x3;
or simply as follows.
ORDINAL: x1;
MODEL:
# predictor models
x3 x1 x2 ~ 1;
# focal model
y ~ x1 x2 x3;
distribution for these variables; instead, the covariate data essentially function as
known constants, as in ordinary least squares. With either specification for the
predictors, listing complete predictors on the FIXED command line indicates that the
variable does not require a model (or distribution). Continuing with the previous
FIXED: x2;
MODEL:
# predictor models
Blimp User’s Guide (Version 3) 21
x3 x1 ~ x2;
# focal model
y ~ x1 x2 x3;
follows.
FIXED: x2;
MODEL:
y ~ x1 x2 x3;
unspecified. This setup is easy to specify, and it is also convenient for centering
because the means are explicit model parameters that MCMC iteratively estimates.
This approach does not limit the composition of the focal analysis model, which can
response variable framework. Blimp can apply a Yeo-Johnson (Yeo & Johnson, 2000)
The next section provides a complete description of the Blimp command language, and
Chapters 4 through 7 provide numerous analysis examples. The examples span a wide
variety of single-level and multilevel analyses with manifest and latent variables,
through to missing data handling, where the distributions of missing values rely on a
collection of univariate models. The advantage of this specification is that Blimp can
interactive or polynomial effects, multilevel models with random effects, and models
refresh, the trivariate distribution factors into the product of three univariate
Blimp estimates the models on the right of the equals sign without assuming anything
about the form or shape of the multivariate distribution on the left. In a simple scenario
where all three three variables are continuous, the factorization corresponds to the
every model in which a variable appears. For example, the distribution of missing Y
Blimp User’s Guide (Version 3) 23
values depends only on the analysis model, and MCMC samples imputations from a
normal curve with center and spread equal to a predicted value and residual variance,
respectively.
normal distributions.
These conditional distributions have analytic solutions in many cases (Levy & Enders,
2021), but Blimp’s MCMC algorithm uses Metropolis sampling to draw imputations
With a collection of additive models like those above, the distributions of missing
fully conditional specification. The same is not true for models with nonlinearities,
the analysis model includes an interaction between X and M. The factorization and the
variable appears. For example, the distribution of missing X values is again the product
2020, Eq. 8). The same issue applies more broadly to models with polynomial or
nonlinear terms and multilevel models with random effects, among others.
Seaman, White, & Carpenter, 2015; Liu, Gelman, Hill, Su, & Kropko, 2014). More
generally, the univariate models described above could feature discrete variables
even latent variables, which Blimp views as missing data to be estimated (imputed).
The following is a list of features and functionality that were introduced in Version 2.
❖ TEST command for Bayesian Wald tests (Asparouhov & Muthén, 2021)
❖ Simplified scripting language and redesigned output
❖ Graphical interface with automatic updates when new features become available
❖ Graphical engine that creates trace plots for all model parameters
Blimp User’s Guide (Version 3) 26
Blimp scripts can also be executed from the terminal without the graphical interface.
This is useful when conducting computer simulations, for example. The most basic
specification includes a file path to the Blimp executable file followed by a file path to
the script to be executed. To illustrate, the following line of code executes a script
/Applications/Blimp/blimp ~/desktop/myscript.imp
Blimp User’s Guide (Version 3) 27
Similarly, the following line uses the Blimp beta engine to execute the same file.
/Applications/Blimp/blimp-beta ~/desktop/myscript.imp
Several parts of the Blimp script can be specified via command line arguments. The
input parameter. For example, the following code block uses a command line argument
BLIMPPATH=/Applications/Blimp/blimp
${BLIMPPATH} ~/desktop/myscript.imp --data ~/desktop/mydata.dat
Note that any parameters specified as command line arguments replace the current
contents of the script (e.g., the file specified on the DATA command is replaced by the
file ~/desktop/mydata.dat).
In addition to the input data, command line arguments include the random number
seed and most quantities exported using the SAVE command. The code block below
shows the full array of command line arguments. The backslash is the Linux command
continuation character; the arguments would otherwise need to appear on a single line
separated by a space.
/Applications/Blimp/blimp ~/desktop/myscript.imp \
--seed {seed value} \
--data {filepath to input data} \
--output {filepath to blimp-out output file} \
--stacked {filepath to stacked imputation data} \
--stacked0 {filepath to stacked original + imputed data} \
--separate {filepath to separate imputation data sets} \
Blimp User’s Guide (Version 3) 28
2.1 Overview
This chapter gives a detailed account of the Blimp’s scripting language. Blimp
commands can be entered in the Blimp Studio syntax editor or in a plain text file with
.imp as the file extension. The code block below shows a typical script with many of
DATA: data.dat;
VARIABLES: id a1:a4 y m x1:x3 z1 z2;
ORDINAL: x1;
NOMINAL: x3;
MISSING: 999;
FIXED: x3;
CENTER: grandmean = x1 x2;
MODEL:
# x1-x3 and x2-x3 interaction predicting m;
m ~ x1 x2 x3 x2*x3;
# m, x1-x3 predicting y;
y ~ m x1:x3;
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
The Blimp command language uses the following general conventions, most of which
❖ A colon can be used to specify a range of variables with the same prefix and suffix
❖ A # symbol indicates a comment that Blimp ignores until the end of the line
❖ Three symbols are needed to specify models: (a) ~ or <- denotes a regression
equation, (b) <-> or ~~ denote variances and covariances, and (c) -> or =~ assigns
indicators to a latent variable
❖ Mathematical operator symbols are * for multiplication, / for division, + for
addition, – for subtraction, ^ or ** to raise a variable or quantity to a power, and
parentheses for specifying order of operations
Blimp also provides a number of built-in functions that work in conjunction with certain
commands. The TRANSFORM command can use these functions to create new
variables, the PARAMETERS command can use these routines to compute auxiliary
parameters that are functions of the estimated model parameters, and functions can be
embedded within regression equations listed in the MODEL statement. The list of
functions is below.
The following built-in functions are only available in the TRANSFORM command:
❖ ismissing(x) = returns missing data indicator for x
❖ lag1(x, time) = returns the lag 1 of x based on the time variable
❖ lag1(x, time, idvar) = returns the lag 1 of x within the grouping variable
idvar, based on the time variable
Blimp User’s Guide (Version 3) 32
DATA Command
The DATA command specifies the input data set, which must be saved as a .csv
(comma separated values) format or a whitespace (including tab) delimited file (e.g.,
.dat or .txt). Blimp accepts only numeric characters for data values (e.g., a nominal
characters (e.g., NA) can be used for missing value codes. Variable names can appear
in the column headers, but the VARIABLE command (described next) must be omitted.
No file path is needed if the Blimp script (the .imp file) is located in the same directory
DATA: mydata.dat;
The DATA command requires a full file path to the input data set that is located in a
directory other than the one that contains the Blimp script. The file path should not be
enclosed in quotations. The following code block reads a data file located in a directory
named “research project” located on the desktop. In line with macOS and other
Unix-based systems conventions, a tilde can be used to reference the user’s home
directory. The following input line reads a data file from a directory within the desktop
folder.
VARIABLE Command
The VARIABLES command specifies the variable names for the data set listed on the
DATA command. in the input file. This command should not be used if the data file has
Blimp User’s Guide (Version 3) 33
variable names as column headers. The variable list may include variables that are not
used in an analysis model or imputation model. The code block below illustrates a
VARIABLES: y x1 x2 x3 x4;
A colon can be used to specify a range of variables with the same prefix but different
VARIABLES: y x1:x4;
The colon specification also works if a group of variables has a common alphanumeric
string following the numeric values (e.g., a set of variables and their recoded
counterparts).
ORDINAL Command
The ORDINAL command identifies ordinal variables that appear in a MODEL statement.
line, but these variables could also be treated as nominal. A colon can be used to
ORDINAL: x1:x5;
By default, Blimp uses a latent response variable (i.e., probit regression) framework for
ordinal variables (Albert & Chib, 1993; Carpenter & Kenward, 2013; Enders et al.,
Blimp User’s Guide (Version 3) 34
2018; Johnson & Albert, 1999), and a logistic link is an option for binary variables
(Asparouhov, T., & Muthén, B. (2021; Polson, Scott, & Windle, 2013).
NOMINAL Command
The NOMINAL command specifies nominal variables that appear in a MODEL statement.
Nominal variables must be represented as a single variable with numeric codes. Blimp
automatically recodes the discrete responses into a set of dummy codes (or latent
response difference scores, in some cases) during estimation. By default, Blimp assigns
the first (lowest) code as the reference category. To change the reference category, list
the numeric code of the desired reference group in parentheses following the variable’s
name. To illustrate, consider two nominal variables X and Z, each with codes 1, 2, and
NOMINAL: x(3) z;
For predictors with unspecified associations, Blimp uses a latent difference score (i.e.,
multinomial probit regression) framework for nominal variables (Albert & Chib, 1993;
Carpenter & Kenward, 2013; Enders et al., 2018; Johnson & Albert, 1999), and it uses
a logistic link for multicategorial nominal variables on the left side of a tilde
(Asparouhov, T., & Muthén, B. (2021; Polson, Scott, & Windle, 2013).
COUNT Command
The COUNT command identifies count variables that appear in a MODEL statement.
Count dependent variables have a negative binomial regression (Asparouhov, T., &
Muthén, B. (2021; Polson, Scott, & Windle, 2013). Predictor variables with count
COUNT: y x3;
FIXED Command
The FIXED command identifies complete predictor variables that do not require a
distribution. Incomplete variables and outcome variables (variables that appear to the
left of a tilde) must be random variables with a distribution. With relatively few
speeds computations and convergence. Fixed variables listed on the CENTERING line
will be centered at the means of the observed data (i.e., the means will not be treated
VARIABLES: y x1 x2 x3 x4;
FIXED: x1 x2;
MODEL: y ~ x1 x2 x3;
CLUSTERID Command
identifier for the level-2 sampling unit (cluster), and three-level analyses require
level-2 and level-3 identifier variables. The order of the identifier variables does not
The code block below illustrates a pair of cluster-level identifiers for a three-level
analysis.
MISSING Command
The MISSING command is used to specify the missing value code. Missing values can
be coded with a single numeric (e.g., 999) or alphanumeric value (e.g., NA). The
following code block specifies a numeric value of 999 as the missing data code.
MISSING: 999;
TRANSFORM Command
The TRANSFORM command creates new variables that are functions of existing
variables. If imputations are requested, the new variable is saved to the output data
TRANSFORM:
newvar1 = expression or function;
newvar2 = expression or function;
Mathematical operator symbols are * and / for multiplication and division, + and – for
addition and subtraction, and ^ to raise a variable or quantity to a power. The following
TRANSFORM:
newvar1 = x * 2;
newvar2 = x / 2;
newvar3 = x + 2;
newvar4 = x - 2;
newvar5 = x^2;
Global functions are listed in Section 2.1, and the following functions are specific to the
BYGROUP Command
(when used in conjunction with the FCS command) or model estimation (when used in
conjunction with the MODEL command) for observed subgroups in the data. For
categories. The following code block specifies fully condition specification imputation
BYGROUP: g;
FCS: y x1 x2;
Similarly, the following code block estimates a separate multiple regression model for
each subgroup of G.
BYGROUP: g;
Blimp User’s Guide (Version 3) 38
MODEL: y ~ x1 x2;
Only a single categorical variable is allowed on the BYGROUP command, although this
variable, sample size permitting. Additionally, the BYGROUP variable should not appear
on the ORDINAL, NOMINAL, or MODEL lines. Trace plots are currently unavailable with
BYGROUP processing. Finally, you can use this command to fit multiple-group models,
LATENT Command
The LATENT command is used to define latent variables (e.g., factors in a measurement
model) that will be referenced in the MODEL section. For example, the code block
below illustrates the specification for a single latent factor with three manifest
indicators.
LATENT: yfactor;
MODEL:
yfactor -> y1:y3;
The default scaling for latent factors is described in the MODEL command section.
Blimp treats all latent variables as missing data to be imputed, and adding the
savelatent keyword to the OPTIONS line saves the estimated latent variable scores
Latent variables can be specified at any level of a multilevel model. This specification
references cluster-level identifier variables from the CLUSTERID line. For example, the
code below illustrates the specification of a level-1 latent factor with three manifest
Blimp User’s Guide (Version 3) 39
indicators measured at level-1 and a level-2 latent factor with three indicators
CLUSTERID: level2id;
LATENT: yfactor, level2id = xfactor;
MODEL:
yfactor -> y1:y3;
xfactor -> x1:x3;
LATENT:
yfactor;
level2id = xfactor;
RANDOMEFFECT Command
The RANDOMEFFECT command is used to define new latent variables that equal the
random intercepts and slopes from a multilevel regression model. These latent
variables are referenced in the MODEL section, where they can be used to predict other
command is similar to the LATENT command except that the random intercept and
slope residuals can only function as predictors and not outcomes like a latent factor.
The specification for a random effect latent variable has four components: (a) the new
latent variable’s name appears on the left side of the equation, (b) the target equation’s
outcome variable is listed after the equals sign, (c) the random slope predictor’s name
from the target model appears after the vertical pipe, and (d) the cluster-level identifier
variable from the CLUSTERID command appears at the end of the line in square
RANDOMEFFECT:
newlatent = outcome variable | random predictor [CLUSTERID var];
To illustrate more concretely, the code block below defines a pair of new latent
variables equal to the random intercept and random slope residuals from a two-level
CLUSTERID: cluster;
RANDOMEFFECT:
ranicepts = y | 1 [cluster];
ranslopes = y | x [cluster];
MODEL:
y ~ x | x;
z ~ ranicepts ranslopes;
MODEL Command
The MODEL command typically consists of one or more univariate regression models.
Blimp’s modeling framework can accommodate a wide range of analyses ranging from
with interactions involving latent variables. This section describes the command, and
Regression Models
Univariate regression models are the building blocks for specifying more complex
variable to the left of the tilde symbol and predictors (or perhaps just an intercept) to
Blimp User’s Guide (Version 3) 41
the right of the tilde. The code block below illustrates a multiple regression analysis
MODEL:
y ~ 1 x1 x2 x3;
Outcome variables that appear to the left of a tilde can be latent factors or manifest
ordinal, multicategorical nominal, count). With the exception of latent outcomes where
means are set equal to 0, Blimp estimates the intercept by default, and the above
MODEL:
# unspecified predictor models
y ~ x1 x2 x3;
can be explicitly specified (i.e., they can appear as outcomes to the left of a tilde), or
Blimp can create them automatically. The previous code block does not list models for
the regressors, so Blimp constructs them as needed for missing data handling. The
grand means and latent group means (multilevel models) are iteratively estimated
parameters. To reiterate, regressions among the predictors are simply a device for
These models usually are not the substantive focus, and they need not have a logical
causal construction.
Blimp User’s Guide (Version 3) 42
features a cascading pattern of univariate regressions, where the first predictor’s model
is empty, the second predictor is regressed on the first, the third on the first and
MODEL:
# sequentially specified predictor models
x3 ~ 1;
x2 ~ x3;
x1 ~ x2 x3;
# focal analysis model
y ~ x1 x2 x3;
Sequential models can be specified more succinctly by listing all predictors on the left
MODEL:
# sequentially specified predictor models
x1 x2 x3 ~ 1;
# focal analysis model
y ~ x1 x2 x3;
When using the FIXED command to identify complete predictor variables that do not
require a distribution, those predictors should only appear on the right side of a tilde in
a sequential specification.
FIXED: x2;
MODEL:
# sequentially specified predictor models
x1 x3 ~ x2;
# focal analysis model
Blimp User’s Guide (Version 3) 43
y ~ x1 x2 x3;
MODEL:
x3 ~ 1;
x2 ~ x3 (x3^2);
x1 ~ x2 x3;
y ~ x1 x2 x3;
distribution (Yeo & Johnson, 2000) that allows X2’s distribution to be positively or
negatively skewed (see the later section on functions embedded within equations).
MODEL:
x3 ~ 1;
yjt(x2) ~ x3;
x1 ~ x2 x3;
y ~ x1 x2 x3;
Lüdtke et al. (2020b) provide recommendations for ordering variables when adopting a
Blimp prints a table of estimates for each outcome variable in a model (i.e., every
variable to the left of a tilde. By default, the tables are printed in alphabetical order.
Users can specify a custom order for tables by defining equation blocks within the
Blimp User’s Guide (Version 3) 44
MODEL statement. Equation blocks are defined by specifying an arbitrary name for the
block (which will appear on the output) followed by a colon. For example, the code
below defines two equation blocks, such that the focal regression output would be the
MODEL:
focal.regression:
y ~ x1 x2 x3;
predictor:models:
x3 ~ 1;
yjt(x2) ~ x3;
x1 ~ x2 x3;
With the exception of latent dependent variables, Blimp automatically estimates each
parameter). The code block below explicitly references the intercept by including a 1
on the right of the tilde (the keyword intercept can be used in lieu of the 1), and it
MODEL:
y ~ 1 x1 x2 x3;
y <-> y;
MODEL:
y ~ 1 x1 x2 x3;
y ~~ y;
Blimp User’s Guide (Version 3) 45
Discrete Outcomes
Discrete outcomes are defined on the ORDINAL, NOMINAL, and COUNT lines. In general,
little or no further specification is needed. For example, the following code block
ORDINAL: y;
MODEL:
y ~ x1 x2 x3;
A logistic regression additionally applies the logit function to the dependent variable,
as shown below.
ORDINAL: y;
MODEL:
logit(y) ~ x1 x2 x3;
Discrete Predictors
Discrete predictors are defined on the ORDINAL, NOMINAL, and COUNT lines (the latter
specification is needed to invoke a discrete predictor. For example, the following code
ORDINAL: x2;
MODEL:
y ~ x1 x2 x3;
Blimp User’s Guide (Version 3) 46
The discrete scores appear in the focal analysis model, but Blimp uses a latent
response variable formulation for the predictor’s supporting regression model (which is
The specification for nominal variables is similar. To illustrate, the code block below
NOMINAL: x2;
MODEL:
y ~ x1 x2 x3;
Blimp uses a latent difference variable formulation (multinomial probit model) for the
predictor’s supporting regression (which is left unspecified above), but a set of dummy
codes appear in the focal analysis model. By default, Blimp assigns the first (lowest)
numeric code as the reference category. To override this default behavior, list the
NOMINAL: x2(3);
MODEL:
y ~ x1 x2 x3;
In some situations, it may be necessary to refer to a specific dummy code (e.g., when
label following the variable’s name. For example, the following code block assigns X2
= 3 as the reference group, and it explicitly references the dummy codes for the X2 = 1
NOMINAL: x2(3);
MODEL:
y ~ x1 x2.1 x2.3 x3
analysis variables (e.g., all structural equation models based on multivariate normal
includes models with incomplete interaction effects, curvilinear effects, and random
Blimp’s estimation architecture avoids this problem by working with a set of univariate
imputing the product directly, Blimp uses a Metropolis sampling step to select
imputations that are consistent with any nonlinear effects in the univariate regression
models. The methodological literature uniformly favors this strategy over so-called
to incomplete nonlinear effects (Bartlett et al., 2015; Enders et al., 2020; Erler et al.,
2016; Kim, Belin, & Sugar, 2018; Kim, Sugar, & Belin, 2015; Lüdtke, Robitzsch, & West,
2019; Zhang & Wang, 2017). The specifications described below are the same for
Interaction terms are specified by connecting two predictors in the same equation with
MODEL:
y ~ x1 x2 x1*x2;
Similarly, the code below shows a three-way interaction with all possible two-way
MODEL:
y ~ x1 x2 x3 x1*x2 x1*x3 x2*x3;
Generally speaking, any variable to the left of a tilde (dependent variables, predictors
model.
Blimp allows for interactions with categorical predictors defined on the NOMINAL and
ORDINAL lines. Binary and ordinal predictors function as numeric variables when
multiplied by another variable; the supporting regressor model again uses a latent
variables require product terms for each dummy code in a set. By default, Blimp
automatically creates a model that includes the necessary product terms. To illustrate,
NOMINAL: x2;
MODEL:
y ~ x1 x2 x3 x2*x3;
Blimp User’s Guide (Version 3) 49
In this case, Blimp automatically generates a model with two product terms, one for
each of the two dummy codes (recall that X2 = 1 is the reference group). In some
NOMINAL: x2;
MODEL:
y ~ x1 x2 x3 x2.2*x3 x2.3*x3;
itself. As such, these terms can specified by connecting a regressor with itself using an
asterisk. The following code block illustrates a quadratic function a with lower-order
MODEL:
y ~ x1 x1*x1 x2;
MODEL:
y ~ x1 (x1^2) x2;
In Blimp, univariate regression models are always the building blocks for specifying
regressions (see Section 1.3). Because Blimp does not work with the joint distribution
of the variables (e.g., impose a multivariate normal distribution on the data), these
univariate equations are uncorrelated by construction. For example, the code block
below illustrates a bivariate analysis with two empty regression equations, but the
correlation (or covariance) between the two dependent variables is not a byproduct of
estimation.
MODEL:
y1 ~ 1;
y2 ~ 1;
Blimp uses phantom latent factors to correlate dependent variables from different
described in Merkle and Rosseel (2018), and Blimp extends their approach to two- and
Like variances, correlations and residual correlations are specified with double-headed
arrows or double tildes. The following code block illustrates a simple bivariate analysis
MODEL:
y1 ~ 1;
y2 ~ 1;
y1 <-> y2;
MODEL:
y1 ~ 1;
y2 ~ 1;
y1 ~~ y2;
MODEL:
y1 <-> y2;
regression.
MODEL:
y1 ~ x1 x2 x3;
y2 ~ x1 x2;
y1 <-> y2;
Finally, multiple correlations can be specified by listing a set of variables on each side
of a double-headed arrow or double tilde. The following code block requests all
MODEL:
y1:y3 x1 x2 <-> y1:y3 x1 x2;
Parameter Constraints
Blimp allows for many types of parameter constraints. These restrictions are imposed
by listing the @ symbol and a numeric value or label following a variable’s name. For
example, the following code block uses a label “beta” to specify an equality constraint
MODEL:
y ~ x1@beta x2@beta x3;
As a second example, the code below uses a numeric label to fix the regression
MODEL:
y ~ 1@0 x1 x2;
Similarly, the code below fixes the variance of a variable to 1 during estimation.
MODEL:
y ~ x1 x2;
y <-> y@1;
Many, but not all model parameters can be constrained. For example, between-group
Auxiliary Variables
Associations among the auxiliary variables and analysis variables follow the same
cascading pattern of univariate models used to connect regressors; the first auxiliary is
regressed on the analysis variables, the second auxiliary variable is regressed on the
first plus the analysis variables, the third is regressed on the first two, and so on. The
code block below illustrates a multiple regression analysis with three auxiliary
variables, A1 to A3.
MODEL:
y ~ x1 x2;
a1 ~ y x1 x2;
a2 ~ a1 y x1 x2;
a3 ~ a1 a2 y x1 x2;
The auxiliary models can be specified more succinctly by listing all auxiliary variables
MODEL:
y ~ x1 x2;
a3 a2 a1 ~ y x1 x2;
Latent Variables
The LATENT command described earlier defines latent variables (e.g., factors in a
measurement model) referenced in the MODEL section. To illustrate, the following code
block shows a basic measurement model with a single latent factor and three normally
LATENT: yfactor;
MODEL:
yfactor -> y1:y3;
By default, Blimp establishes identification by fixing the first factor loading to one and
the latent mean (or intercept) to zero. The following code block uses univariate
LATENT: yfactor;
MODEL:
yfactor ~ 1@0;
y1 ~ yfactor@1;
y2 ~ yfactor;
y3 ~ yfactor;
It may be beneficial to override the default identification settings in some cases. For
indicator with complete data (or the indicator with the least amount of missing data) or
fixing one of the regression intercepts instead of the latent mean to 0. To illustrate, the
code block below illustrates a specification with the following features: (a) Y1’s loading
is freely estimated, (b) the latent mean is estimated, (c) Y3’s measurement intercept is
LATENT: yfactor;
MODEL:
# estimate the latent mean
yfactor ~ 1;
# estimate loadings
y1 ~ yfactor;
y2 ~ yfactor;
Blimp User’s Guide (Version 3) 55
be imputed (adding the savelatent keyword to the OPTIONS line saves the
estimated latent scores to the imputed data sets). Imputing the latent scores opens up
interesting opportunities not available in other software packages. For example, Blimp
allows a latent variable to interact with a manifest variable or with another latent
variable interaction.
LATENT: xfactor;
MODEL:
xfactor -> x1:x3;
y ~ xfactor z xfactor*z;
The manifest variable Z is normal in this example, but it could have any metric that
Blimp supports. Similarly, two latent variables can interact with one another. The
latent variable. The code below shows a latent variable with a quadratic effect on the
outcome.
Blimp User’s Guide (Version 3) 56
LATENT: xfactor;
MODEL:
xfactor -> x1:x3;
y ~ xfactor (xfactor^2) z;
those for single-level regression models. Blimp automatically determines the level at
which a variable is measured in a multilevel data set, so the user need only provide a
basic model specification. The one exception is latent variables, the levels of which
must be specified in the LATENT command. Enders et al. (2020) provide specific details
about Blimp’s multilevel modeling framework, which uses the same factored
centered at their grand means or group means (Enders & Tofighi, 2007) using the
where X1 and X2 are level-1 and level-2 predictors, respectively. The following code
CLUSTERID: level2id;
MODEL:
y ~ x1.i x2.j;
The estimated model includes a random intercept in the analysis model as well as in
X1’s supporting model. In some cases, it may be necessary to manually reference the
Blimp User’s Guide (Version 3) 57
random intercept (e.g., when labeling or constraining the parameter). In the code block
below, the 1 to the right of the vertical pipe represents a random intercept.
CLUSTERID: level2id;
MODEL:
y ~ x1.i x2.j | 1;
Random slope coefficients are specified by listing lower-level predictors to the right of
a vertical pipe. For example, the code block below illustrates a regression model with
random intercepts (implicit) and a random slope for the level-1 predictor X1.
CLUSTERID: level2id;
MODEL:
y ~ x1.i x2.j | x1.i;
and random slopes. Adding the savelatent keyword to the OPTIONS line saves the
Random intercepts and slopes can also appear as regressors in other equations. To
illustrate, the code block below uses the RANDOMEFFECT command to define the
intercepts and slopes as cluster-level latent variables that predict another variable Z.
CLUSTERID: level2id;
RANDOMEFFECT:
ranicepts = y | 1 [level2id];
ranslopes = y | x1.i [level2id];
MODEL:
y ~ x1.i x2.j | x1.i;
z ~ ranicepts ranslopes;
Blimp User’s Guide (Version 3) 58
Blimp can also estimate three-level models. To illustrate, consider a three-level model
where X1 and X2 are level-1 and level-2 regressors, respectively, and X3 is a level-3
measured. The following code block illustrates a three-level regression model with
As a second example, the code block below illustrates a three-level random slope
model where the influence of the level-1 regressor X1 varies across level-2 and level-3
units and the influence of the level-2 predictor X2 varies across level-3 units.
fix certain variance components to zero (or alternatively, select which variances get
estimated). This is achieved by listing the desired random effects on the right side of
the vertical pipe and appending to the effect’s name a cluster-level identifier in square
brackets. To illustrate, the following code block illustrates a three-level model with
random intercepts at both levels and a random coefficient for X1 at the second level.
Blimp User’s Guide (Version 3) 59
the level-3 covariance matrix reduces to a scalar with only a random intercept variance.
Multilevel regression models can also include cluster means as group-level predictors
(i.e., contextual effects; Longford, 1989; Raudenbush & Bryk, 2002). Appending the
.mean keyword to the end of a lower-level covariate’s name references that variable’s
latent group means. To illustrate, the following code block specifies a two-level
regression model that includes X1 as a level-1 predictor and its group means as a
level-2 predictor.
CLUSTERID: level2id;
MODEL:
y ~ x1 x1.mean x2;
Importantly, the group means are cluster-level latent variables rather than
latent variable specification because it can reduce bias associated with arithmetic or
“manifest” group means in some scenarios (Hamaker & Muthén, 2019; Lüdtke et al.,
2008).
introduces the level-2 and level-3 latent group means as predictors. To specify the
group means at one level but not the other, additionally append the cluster-level
identifier variable in square brackets. For example, the following code block illustrates
Blimp User’s Guide (Version 3) 60
a three-level random intercept regression with X1’s level-3 latent group means as a
Blimp allows users to embed functions inside parentheses on the right side of
regression equations and, in limited cases, on the left side as well. As an example, the
MODEL:
y ~ (x1 - 10);
The next example uses an embedded function to specify a curvilinear regression where
MODEL:
y ~ x (x^2);
Embedded functions can also reference multiple variables. For example, the following
code block defines the predictor variable as the sum of four ordinal variables (e.g., a
ORDINAL: x1:x4;
MODEL:
y ~ (x1 + x2 + x3 + x4);
Blimp User’s Guide (Version 3) 61
The sum score is the regressor in the previous model, but any missing data handling
items (i.e., the sum is not itself a random variable, but rather a deterministic function of
the items). The above function can also be specified using the following syntax
ORDINAL: x1:x4;
MODEL:
y ~ x1:+:x4;
placing equality constraints on item-level coefficients from the same scale, as follows.
ORDINAL: x1:x4;
MODEL:
y ~ x1@beta x2@beta x3@beta x4@beta;
Embedded functions can also be part of interactive effects. To illustrate, the following
code block shows an interaction between a scale (sum) score involving five items and a
ORDINAL: x1:x4;
MODEL:
y ~ x1:+:x4 m (( x1:+:x4 ) * m);
ORDINAL: x1:x5;
MODEL:
Blimp User’s Guide (Version 3) 62
Extending the previous idea, the code below shows the interaction between two scale
scores, one computed as the sum of four ordinal items and the other computed as the
Blimp also allows embedded functions on the left side of the tilde, but the range of
MODEL:
ln(y) ~ x1 x2 x3;
As a second example, the code below applies the Yeo-Johnson (Yeo & Johnson, 2000)
MODEL:
yjt(y) ~ x1 x2 x3;
The Yeo-Johnson procedure estimates the shape of the data as the MCMC algorithm
iterates and produces imputations from a skewed distribution. The analysis examples
CENTERING Command
equations. This command affects Blimp’s printed estimates but has no bearing on
imputations generated by the SAVE command. For complete variables listed on the
FIXED line, Blimp centers variables at arithmetic averages (grand mean or group
means). For all variables assigned a distribution, the CENTERING command treats
grand means and group means as random variables to be estimated at each MCMC
iteration (Enders & Keller, 2019). Any product terms specified on the MODEL line
centering at the grand means is the only option. The code block below shows a basic
CENTERING: x1 x2;
MODEL: y ~ x1 x2 x3;
means or group-level cluster means (lower-level regressors only). In this case, the type
is centered at the level-2 latent group means, and a level-2 predictor X2 is centered at
Blimp User’s Guide (Version 3) 64
its grand mean (group mean centering is not an option for variables at the highest
level).
CLUSTERID: level2id;
CENTERING: groupmean = x1.i, grandmean = x2.j;
MODEL: y ~ x1.i x2.j x1.i*x2.j | x1.i;
CLUSTERID: level2id;
CENTERING:
groupmean = x1.i;
grandmean = x2.j;
MODEL: y ~ x1.i x2.j x1.i*x2.j | x1.i;
Importantly, group mean centering reflects deviations between the raw scores and
latent group means (unless the variable is complete and listed on the FIXED line, in
which case the group means are arithmetic averages). Further, group mean centering is
always performed by subtracting the latent group means at the next level of the data
hierarchy. For example, if the previous analysis was a three-level model, the centering
procedure would subtract X1 scores from the level-2 latent group means. The group
means themselves can be included in the analysis model, and these latent variables
Categorical variables can also be centered (Enders & Tofighi, 2007; Yaremych &
nominal) are modeled as underlying normal latent response variables. The grand and
group means are also modeled on the latent metric, and listing categorical variables on
the CENTERING command invokes a transformation that converts the latent mean to
Blimp User’s Guide (Version 3) 65
the metric required by the analysis model (Enders & Keller, 2019). For example,
centering a binary predictor converts the latent grand mean to a “manifest” mean equal
variables with three or more categories can be computationally intensive because the
latent mean conversion requires Monte Carlo integration at each MCMC step.
SIMPLE Command
The SIMPLE command is used to request conditional effects (e.g., simple intercepts and
simple slopes) from a regression model with an interaction effect. At each MCMC
parameters thus have their own distribution, credible intervals, et cetera. The
The code block below shows the basic specification where the SIMPLE command
requests the conditional effects of X (the focal predictor) at different values of M (the
moderator).
CENTERING: x m;
MODEL: y ~ x m x*m;
SIMPLE: x | m;
CENTERING: x m;
MODEL: y ~ x m x*m;
SIMPLE: x | m, m | x;
CENTERING: x m;
MODEL: y ~ x m x*m;
SIMPLE:
x | m;
m | x;
When a continuous moderator is listed to the right of the vertical pipe, Blimp
standard deviations from 0. We highly recommend centering the focal predictor and
moderator such that zero represents the mean. In a multilevel model, the standard
its group means has only within-cluster variation, so the pooled within-cluster
standard deviation is used. A continuous moderator centered at its grand mean has
used. The number of standard deviation units can also be specified. For example, the
code block below requests the simple slopes of X at one half of a standard deviation
CENTERING: x m;
MODEL: y ~ x m x*m;
SIMPLE:
x | m@.5SD;
x | m@-.5SD;
When a nominal moderator variable is listed to the right of the vertical pipe, Blimp
automatically computes and reports conditional effects for every group. When an
ordinal variable is listed to the right of the vertical pipe, the pick-a-point score values
must be specified. To illustrate, the following code block specifies conditional effects at
Blimp User’s Guide (Version 3) 67
M = 0 and M = 1 (e.g., conditional effects at each level of a binary dummy code). The
CENTERING: x m;
MODEL: y ~ x m x*m;
SIMPLE:
x | m@0;
x | m@1;
The main restriction with the SIMPLE command occurs in models with multiple
equations. In this case, a dependent variable in one equation can serve as the
moderator in another equation, but the user must specify the values being conditioned
on. The code block below shows a specification where M is the dependent variable in
one equation and a moderator in the other. The variable M appears to the right of the
vertical pipe along with the fixed values to condition in (i.e., default standard deviation
CENTERING: x m;
MODEL:
m ~ x z;
y ~ x m x*m;
SIMPLE:
x | m@0;
x | m@1;
No such specification is necessary if the dependent variable in one equation is the focal
predictor in the other (i.e., appears to the left of the vertical pipe). The code block
below shows a specification where M is the dependent variable in one equation and
the focal predictor in the other. The variable X appears to the right of the vertical pipe,
Blimp User’s Guide (Version 3) 68
and the output would return the conditional effect of M at standard deviation units
CENTERING: x m;
MODEL:
m ~ x z;
y ~ x m x*m;
SIMPLE:
m | x;
PARAMETERS Command
The PARAMETERS command is used to (a) define auxiliary parameters that are
functions of a model’s estimated parameters, and (b) specify custom prior distributions.
The command uses the same mathematical operators and accesses the same functions
equation that defines a new parameter. Auxiliary parameters are computed at every
MCMC iteration, and thus they have their own distributions and summary tables in the
output.
As a first example, recall from the MODEL command section that Blimp links dependent
code block below labels the variances and correlation and uses the labels to compute
the covariance, which is the product of the correlation and the standard deviations.
MODEL:
y ~~ y@yvar;
Blimp User’s Guide (Version 3) 69
x ~~ x@xvar;
y ~~ x@yxcorr;
PARAMETERS:
yxcov = yxcorr * sqrt(yvar * xvar);
As a second example, the following code block labels the three slope coefficients in a
moderated regression model and uses the PARAMETERS command to compute the
MODEL:
y ~ x@beta1 m@beta2 x*m@beta3;
CENTER: x m;
PARAMETERS:
m0 = 0;
m1 = 1;
slope.at.m0 = beta1 + m0 * beta3;
slope.at.m1 = beta1 + m1 * beta3;
As a final example, the code block below labels pathways from a single-mediator
model and uses the PARAMETERS command to compute the product of coefficients
MODEL:
m ~ x@apath;
y ~ x@cpath m@bpath;
PARAMETERS:
indirect = apath * bpath;
The second major use for the PARAMETERS command is to introduce custom prior
prior distributions.
Blimp User’s Guide (Version 3) 70
To illustrate, the following code block shows a simple regression model with
informative normal priors on the regression coefficients and an inverse gamma prior for
the variance with a = 1 and b = .5 (i.e., 2 additional degrees of freedom and unit sum of
squares). This prior specification for the variance is identical to listing prior1 on the
OPTIONS line.
MODEL:
y ~ 1@beta0prior x@beta1prior;
y ~~ y@resvarprior;
PARAMETERS:
beta0prior ~ normal(2,20);
beta1prior ~ normal(5,10);
resvarprior ~ invgamma(1,.5);
command section, the PARAMETERS command can also access the following
Blimp User’s Guide (Version 3) 71
TEST Command
The TEST command is used to perform the Bayesian Wald test described by
Asparouhov and Muthén (2021). This command is available in models with a single
outcome and multivariate models with several outcomes (e.g., path models). The TEST
parameters in the MODEL statement and use the TEST command to specify the null
MODEL:
y ~ x1@b1 x2@b2 x3@b3;
TEST:
b1 = 0;
b2 = 0;
Tests of multiple parameters can also be specified using the following shortcut.
Blimp User’s Guide (Version 3) 72
MODEL:
y ~ x1@b1 x2@b2 x3@b3;
TEST:
b1:b3 = 0;
More than one test can be performed by specifying multiple TEST commands. For
example, the following code block yields two tests, the first of which involves a single
MODEL:
y ~ x1@b1 x2@b2 x3@b3;
TEST:
b1 = 0;
TEST:
b2:b3 = 0;
The TEST command can also evaluate equality and other types of constraints. For
example, the following code block tests an equality constraint on two regression
slopes.
MODEL:
y ~ x1@b1 x2@b2 x3@b3;
TEST:
b1 = b2;
command. For example, the following code block evaluates whether one slope differs
MODEL:
y ~ x1@b1 x2@b2 x3@b3;
Blimp User’s Guide (Version 3) 73
TEST:
b1 = b2;
b3 = 0;
The second way to implement the TEST command is similar to the MODEL
statement—it specifies a regression model. However, the model listed on the TEST line
must be nested within the model listed on the MODEL statement. The first way to
specify the nested model is to exclude parameters from the nested model. The code
block below illustrates a comparison involving a full model with three predictors and a
MODEL:
y ~ x1 x2 x3;
TEST:
y ~ 1;
values by appending an @ and a numeric label to a variable or effect. For example, the
following code block illustrates an equivalent specification that fixes three slope
coefficients to 0.
MODEL:
y ~ x1 x2 x3;
TEST:
y ~ x1@0 x2@0 x3@0;
The TEST command produces the output table below. The Wald test statistic is a
chi-square variable, and the test’s degrees of freedom equals the number of
parameters by which the two models differ. The probability value is not a frequentist
p-value because it makes no reference to test statistics from other random samples.
Blimp User’s Guide (Version 3) 74
Rather, the probability is an index of support for the proposed constraints, where
support is defined as the area above the test statistic in a chi-square distribution.
MODEL FIT:
Test #1
The TEST command can also compare nested models with different variances. To
illustrate, the code block below shows a two-level model with random coefficients,
where the TEST command is used to specify a random intercept model with two fewer
parameters.
CLUSTERID: level2id;
MODEL:
y ~ x1 x2 x3 | x1;
TEST:
y ~ x1 x2 x3 | x1@0;
FCS Command
(FCS–MI) approach similar to that described by Stef van Buuren and colleagues (van
Buuren, 2007; van Buuren, Brand, Groothuis-Oudshoorn, & Rubin, 2006). This
command cannot be used in conjunction with the MODEL command. Rather, FCS
deploys an MCMC algorithm that cycles through incomplete variables one at a time,
Blimp User’s Guide (Version 3) 75
imputing each variable from an additive equation that features the incomplete variable
regressed on all other variables listed on the FCS line. This algorithm makes no
distinction between outcomes and regressors in the subsequent analysis model; all
entities listed on the FCS line are simply variables to be imputed or complete variables
that contribute to imputation. The SAVE command outputs the filled-in data sets for
reanalysis using frequentist methods (Rubin, 1987). FCS–MI is known to introduce bias
when applied to analysis models with nonlinear terms such as interactions, polynomial
To illustrate FCS–MI, consider a simple scenario with one continuous variable X, one
binary dummy variable D, and one 7-category ordinal variable O. The code block
below shows a basic script (which could also include nominal variables).
DATA: data.dat;
VARIABLES: id a1:a5 x d o z;
ORDINAL: d o;
MISSING: 999;
FCS: x d o;
NIMPS: 100;
CHAINS: 100;
BURN: 1000;
ITERATIONS: 10000;
OPTIONS: savelatent;
SAVE: stacked = imps.dat;
At a minimum, the FCS command should include all variables and effects of interest in
the analysis model(s), but the list may also include additional auxiliary variables. The
commands following the FCS line in the script are described later in this section.
Blimp User’s Guide (Version 3) 76
Blimp’s FCS–MI routine primarily differs from the classic MICE (Multiple Imputation by
Chained Equations; van Buuren & Groothuis‐Oudshoorn, 2011) approach in two ways.
First, Blimp’s algorithm is a true Gibbs sampler; this is a small technical nuance that
makes no difference in practice. Second, Blimp adopts a fully latent specification for all
that views discrete scores as arising from one or more normally distributed latent
response variables (or latent response difference scores in the case of multicategorical
nominal variables). Applied to the previous example, the binary dummy variable D and
the 7-category ordinal variable O have corresponding latent response variables D* and
O*, respectively. Blimp’s FCS–MI routine uses the latent variables both as predictors
and as outcomes. The round robin imputation models for this example are as follows.
The latent response models also incorporate threshold parameters that divide the
latent distributions into discrete segments, and the residual variances of r2 and r3 are
Listing the savelatent keyword on the OPTIONS line saves both the discrete and
latent response variables to the imputed data files (by default, only the discrete
imputes are written to the imputed data files). The imputed latent scores (plausible
values) could be used in lieu of the discrete scores in a subsequent analysis. For
example, the analysis in Section 5.7 illustrates an item-level factor analysis that uses
imputed latent response scores. In a similar vein, Muthén and Asparouhov (2016)
Blimp User’s Guide (Version 3) 77
describe an application that replaces a binary mediator with a latent response variable.
If desired, listing the mice and manifest keywords on the OPTIONS line alters
Blimp’s default behaviors and invokes an algorithm that is equivalent to the one in the
right side of equations. The round robin imputation models for this example are as
follows.
The MICE package deploys logistic rather than probit models for categorical variables,
automatically introduces the latent group means of all lower-level variables in the
imputation model (i.e., latent contextual effects); this is true for both continuous and
latent response variables. Including the group means in the imputation model allows
associations. Listing the noclmean keyword on the OPTIONS line removes the latent
group means from the regression models, producing a more restrictive imputation
model where the within- and between-cluster regressions are assumed to be equal.
Blimp User’s Guide (Version 3) 78
the hev keyword on the OPTIONS line. This method is described in Kasim and
Raudenbush (1998).
BURN Command
The BURN command specifies the number of burn-in iterations. Bayesian analysis
results summarize estimates taken from iterations following the burn-in period, and
multiple imputations (via FCS or MODEL) are saved after the burn-in period. To
BURN: 5000;
potential scale reduction factor diagnostic (Gelman & Rubin, 1992) from the Blimp
output. Material at the beginning of Chapter 3 describes how to use these diagnostics.
ITERATIONS Command
The ITERATIONS command specifies the number of iterations after the burn-in period.
The tabular summaries reflect Bayesian analysis results taken from the post burn-in
period. To illustrate, the following code block specifies 10,000 MCMC iterations
BURN: 5000;
ITERATIONS: 10000;
Note that the total number of iterations is distributed equally across the number of
MCMC chains, the default value of which is two (see the CHAINS command). In our
Blimp User’s Guide (Version 3) 79
experience, 10,000 iterations is usually more than sufficient, but material at the
CHAINS Command
The CHAINS command is used to specify the number of MCMC processes (and
optionally, the number of processors used for computation). The default number of
chains is two, and the total number of computational cycles specified on the
ITERATIONS line is always divided equally across chains. By default, Blimp attempts
to distribute MCMC chains across physical cores, resulting in faster computation (e.g.,
process per core). Because Blimp automatically uses the maximum available cores, this
specification would primarily be used to specify fewer resources. For example, the code
block below specifies 10,000 iterations spread across 10 unique MCMC chains. The
ITERATIONS: 10000;
CHAINS: 10 processors 2;
By default, each chain will have a different seeding value and different random starting
NIMPS Command
The NIMPS command is used to specify the desired number of multiple imputation data
sets to save during MCMC estimation (saving imputed data sets is optional). Graham,
Olchowski, and Gilreath (2007) suggest using at least 20 imputed data sets to
maximize power, and other studies have shown that 100 or more imputations may be
Blimp User’s Guide (Version 3) 80
necessary to reduce the impact of Monte Carlo simulation error on standard errors and
(Bodner, 2008; Harel, 2007; von Hippel, 2018). Imputations can be saved at regular
intervals during a single MCMC chain, at the final iteration of multiple MCMC
processes, or some combination of the two. The code block below saves 100 imputed
data sets from the final iteration of 100 MCMC chains, each with 5,000 burn-in
iterations and 100 iterations thereafter (i.e., 10,000 total iterations spread across 100
MCMC processes).
BURN: 5000;
ITERATIONS: 10000;
NIMPS: 100;
CHAINS: 100;
THIN Command
The THIN command is used to specify the between-imputation interval when saving
multiple imputations from the same MCMC chain. For example, the following code
block deploys two MCMC chains (the default) that create 100 filled-in data sets by
saving imputations every 1,000 iterations after the 5,000-iteration burn-in period.
NIMPS: 100;
BURN: 5000;
THIN: 1000;
Saving multiple imputations is optional, and this command is not necessary; however,
either THIN or ITERATIONS must be specified when saving filled-in data sets. The
THIN command has no impact on printed parameter summaries, which are always
OPTIONS Command
The following keywords are used in conjunction with either the FCS or MODEL
commands. Bolded keywords are default and do not require explicit specification.
The following keywords are used in conjunction with the FCS command to alter the
The following keywords are used in conjunction with the SAVE command to alter the
OUTPUT Command
default, Blimp prints the posterior median, posterior standard deviation, 95% credible
interval limits, split chain potential scale reduction factor, and effective number of
MCMC samples (estimated number of independent MCMC iterations using split chain
approach) for each parameter. Listing any of the following keywords on the OUTPUT
command overrides Blimp’s default output tables with new tables containing the
requested quantities.
To illustrate, the code block below creates a custom table displaying only the median, a
set of quantiles (2.5%, 25%, 50%, 75%, and 97.5%), and potential scale reduction
The code block below specifies Blimp’s default output with the additional quantities.
SAVE Command
The SAVE command is used to save byproducts of MCMC estimation. The principal use
for this command is to save multiply imputed data sets, but the command also saves
parameter estimates from the burn-in and post burn-in iterations, posterior summaries,
and potential scale reduction factors. Unless a full file path is specified, Blimp saves
the specified files to the directory that contains the input script.
Multiple imputations can be saved in three different formats: (a) as separate data files
(ideal for analysis in Mplus or HLM), (b) in a single stacked file with an additional
Blimp User’s Guide (Version 3) 84
identifier variable that indexes imputations (ideal for analysis in R, SPSS, and SAS),
and (c) a single stacked file that includes the original data indexed with a zero value
(ideal for analysis in Stata). The following code block illustrates all three specifications.
SAVE:
separate = imp*.dat;
stacked = imps.dat;
stacked0 = imps0.dat;
When saving imputations to separate files, the asterisk in the file path is replaced with
an integer in the file name (e.g., specifying imp*.dat produces imputed data sets
also generates a text file that contains the names of the individual data files (this file
The imputed data sets include all variables from the input data (regardless of whether
they were used in an analysis or imputation routine) along with the values of any latent
variables, predicted scores, and residuals specified on the OPTIONS line (the
adds a variable to the first column of the data that indexes the data sets. The order of
the variables in the imputed data sets is listed at the bottom of the Blimp output. The
stacked = ‘imps.dat’
imp# id n1 d1 o1 y x1 d2 x2 x3
Blimp User’s Guide (Version 3) 85
In addition to creating imputed data sets, the SAVE command can produce files
posterior summaries of the parameter estimates as they appear on the Blimp output
scale reduction factor values for all parameters (psr = filename;), the Bayesian
Wald test statistic (waldtest = filename;), and the average imputation across the
SAVE:
burn = burnin.dat;
iterations = iterations.dat;
estimates = estimates.dat;
starts = starts.dat;
psr = psr.dat;
waldtest = wald.dat;
avgimp = averageimps.dat;
specifying an asterisk in the filename. Blimp replaces this symbol in the filename with
a numeric value that indexes the chains. The following code block illustrates this
specification.
SAVE:
burn = burnin*.dat;
iterations = iterations*.dat;
estimates = estimates.dat;
starts = starts.dat;
psr = psr.dat;
Blimp User’s Guide (Version 3) 86
avgimp = averageimps.dat;
Parameter summaries and starting values are saved in a single file regardless of the
Diagnosing the MCMC algorithm’s convergence and determining the total number of
computational cycles is an important part of any analysis. The initial burn-in (trial)
period should be long enough for the algorithm to achieve independence from its
random starting values and achieve a steady state (i.e., converge in distribution); the
total number of iterations after the burn-in period should be large enough to provide
adequate precision. This section describes this process of determining these two
quantities. These steps are applicable to any analysis, including all the ensuing
examples. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
The first step in an analysis is to perform a preliminary diagnostic run to determine the
length of the burn-in period. This initial period should be long enough for the MCMC
cycles for the preliminary analysis. The code block below estimates a two-level
random coefficient model (see Example 6.3) with this setting on the BURN line. The
default number of chains is two, and the number of iterations after the burn-in period
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
ORDINAL: d1.j;
Blimp User’s Guide (Version 3) 88
MISSING: 999;
FIXED: d1.j;
CENTER:
groupmean = x1.i;
grandmean = x2.i x7.j d1.j;
MODEL: y.i ~ x1.i x2.i x7.j d1.j | x1.i;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
OPTIONS: labels;
Blimp divides the burn-in period into 20 equal segments and computes the split-chain
potential scale reduction factor (Gelman et al., 2014) at the end of each interval. The
table below shows the highest (worst) potential scale reduction factor across all
model parameters.
NOTE: Split chain PSR is being used. This splits each chain's
iterations to create twice as many chains.
The table shows that the index drops to acceptable levels (e.g., less than 1.05, where 1
is the theoretical minimum) by iteration 5,000. A good rule of thumb is to set the
burn-in period for the final run to a value at least as large as 5,000. If the value in the
bottom row of the table (the final checkpoint) exceeds 1.05, increase the number of
The potential scale reduction factor table indicates that the highest (worst) values prior
to convergence are primarily associated with parameter numbers 13 and 5. Listing the
optional labels keyword on the OPTIONS line prints a table of potential scale
reduction factors for all model parameters along with their numeric indices. In some
cases (e.g., latent variable models), very high potential scale reduction factors will be
general, these can be ignored, and the focus should be on the unstandardized
parameters. The table for the focal regression model is shown below (unspecified
predictor models also have similar tables). The table indicates that parameter numbers
13 and 5 correspond to the standardized coefficient for a level-2 predictor and the
intercept, respectively. The columns of the table give the potential scale reduction
factors for the final five checkpoints during the burn-in period.
Blimp User’s Guide (Version 3) 90
PARAMETER LABELS:
NOTE: Split chain PSR is being used. This splits each chain's
iterations to create twice as many chains.
Variances:
1 L2 : Var(Intercept) 1.00 1.00 1.00 1.00 1.00
2 L2 : Cov(x1.i,Intercept) 1.00 1.00 1.00 1.00 1.00
3 L2 : Var(x1.i) 1.01 1.01 1.02 1.01 1.00
4 Residual Var. 1.00 1.00 1.00 1.00 1.00
Coefficients:
5 Intercept 1.01 1.00 1.01 1.01 1.01
6 x1.i 1.00 1.00 1.00 1.00 1.00
7 x2.i 1.00 1.00 1.00 1.00 1.00
8 x7.j 1.01 1.03 1.03 1.03 1.02
9 d1.j 1.00 1.00 1.00 1.00 1.00
Standardized Coefficients:
10 x1.i 1.00 1.00 1.00 1.00 1.00
11 x2.i 1.00 1.00 1.00 1.00 1.00
12 x7.j 1.01 1.03 1.03 1.03 1.02
13 d1.j 1.00 1.00 1.00 1.00 1.00
A trace plot of the intercept estimates from the first 5,000 computational cycles is
shown below. Plot features such as the number of chains or iterations printed can be
set in the Blimp Studio > Preferences pull-down menu. Plotting can also be turned off
The next step is to set the burn-in period and total number of iterations for the final
analysis. We find it useful to specify 10,000 iterations following the initial burn-in
period, which for this example we set at 5,000 based on the preliminary diagnostic
run. The code block below reflects these settings on the BURN and ITERATIONS line.
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
Blimp User’s Guide (Version 3) 92
ORDINAL: d1.j;
MISSING: 999;
FIXED: d1.j;
CENTER:
groupmean = x1.i;
grandmean = x2.i x7.j d1.j;
MODEL: y.i ~ x1.i x2.i x7.j d1.j | x1.i;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
The Blimp output tables include point estimates and measures of uncertainty
(posterior median and standard deviation), 95% credible interval limits, potential scale
reduction factors for the iterations following the burn-in period, and the effective
number of MCMC samples. The output tables generally include a section for variances,
coefficients, standardized estimates, and variance explained effect sizes (Rights &
Sterba, 2019).
Coefficients:
Intercept 4.168 0.070 4.030 4.305 1.015 169.900
x1.i -0.094 0.019 -0.132 -0.056 1.000 1841.459
x2.i 0.086 0.008 0.071 0.102 1.000 2622.345
Blimp User’s Guide (Version 3) 93
Standardized Coefficients:
x1.i -0.094 0.020 -0.133 -0.056 1.001 1870.786
x2.i 0.184 0.019 0.148 0.222 1.001 2510.830
x7.j 0.054 0.064 -0.069 0.183 1.025 151.449
d1.j -0.050 0.071 -0.188 0.093 1.003 143.392
-------------------------------------------------------------------
are based after removing autocorrelations from the MCMC process. Gelman et al.
(2014, p. 287) recommend values greater than 100. All values in the example table
exceed this recommended minimum. Increasing the total number of iterations would
The analysis examples in this chapter primarily illustrate different types of univariate
regression models. Univariate regressions are the basic building blocks of more
complicated multivariate and latent variable models, which are just collections of
univariate equations. In general, it is possible to mix and match features from any
examples to easily create complex analysis models that honor features of the data. The
examples use a generic notation system where variable names usually consist of an
alphanumeric prefix and a numeric suffix (e.g., Y1, X1, N1, D1, D2, V1, V2, V3). The letter
variable. Other letters generally represent continuous variables. Finally, the model
equations use a “cgm” superscript to indicate grand mean centering. The following list
outlines the examples in this section.The following list outlines the examples in this
section.
This example illustrates correlations and descriptive statistics. Clicking the links below
downloads the Blimp scripts and data for this example, and the full set of User Guide
The following code block estimates the means, variances and correlations.
DATA: data1.dat;
VARIABLES: id n1 d1 y1 y2 x1 d2 x2 x3;
MISSING: 999;
MODEL:
x1 y1 y2 <-> x1 y1 y2;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 96
The code block below labels variance parameters and uses the PARAMETERS
covariances).
DATA: data1.dat;
VARIABLES: id n1 d1 y1 y2 x1 d2 x2 x3;
MISSING: 999;
MODEL:
x1 y1 y2 <-> x1 y1 y2;
x1 <-> x1@varx1;
y1 <-> y1@vary1;
y2 <-> y2@vary2;
PARAMETERS:
sd.x1 = sqrt(varx1);
sd.y1 = sqrt(vary1);
sd.y2 = sqrt(vary2);
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
This example illustrates polychoric correlations among continuous variables and latent
response scores from binary and ordinal variables. Clicking the links below downloads
the Blimp scripts and data for this example, and the full set of User Guide examples is
Ex4.2.imp data1.dat
DATA: data1.dat;
VARIABLES: id n1 d1 o1 y1 x1 d2 x2 x3;
ORDINAL: d1 o1;
MISSING: 999;
MODEL:
d1 o1 y1 x1 <-> d1 o1 y1 x1;
SEED: 90291;
BURN: 25000;
ITERATIONS: 10000;
This example illustrates a linear regression analysis. Clicking the links below
downloads the Blimp scripts and data for this example, and the full set of User Guide
Ex4.3.imp data1.dat
The model features a pair of continuous predictors and a binary dummy code, as
follows. The cgm superscript denotes variables centered at their grand means.
DATA: data1.dat;
VARIABLES: id n1 d1 o1 y x1 d2 x2 x3;
ORDINAL: d2;
MISSING: 999;
FIXED: d2;
CENTER: x1 x2;
MODEL: y ~ x1 d2 x2;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
Blimp can save multiple imputations from any model it estimates. This example
regression model from Example 4.3. Clicking the links below downloads the scripts
and data for this example, and the full set of User Guide examples is available from a
❖ Imputations are stacked in a single file with an index variable added in the first
column
DATA: data1.dat;
VARIABLES: id n1 d1 o1 y x1 d2 x2 x3;
ORDINAL: d2;
MISSING: 999;
FIXED: d2;
CENTER: x1 x2;
MODEL: y ~ x1 d2 x2;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
CHAINS: 20;
NIMPS: 20;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
stacked = 'imps.dat'
imp# id n1 d1 o1 y x1 d2 x2 x3
The imputed data sets can be analyzed in other software packages. For example, the
script below uses the R package mitml (Grund, Robitzsch, & Lüdke, 2021) to fit the
linear regression model to the filled-in data sets. The resulting estimates are
# center predictors
imps$x1.cgm <- imps$x1 - mean(imps$x1)
imps$x2.cgm <- imps$x2 - mean(imps$x2)
predictor. Clicking the links below downloads the scripts and data for this example,
and the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex4.4.imp data2.dat
and N1.2, N1.3, and N1.4 are dummy codes that represent a four-category nominal
predictor (N1 = 1, 2, 3, 4). The cgm superscript denotes variables centered at their
Blimp User’s Guide (Version 3) 101
grand means. The syntax highlights are shown below, and adding the NIMPS and
DATA: data2.dat;
VARIABLES: id y1 y2 x1 d1 d2 n1 x2 n2;
ORDINAL: d1;
NOMINAL: n1;
MISSING: 999;
FIXED: x1;
CENTER: x1;
MODEL: y1 ~ x1 d1 n1;
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
filled-in data sets tailored to the analysis specified on the MODEL line. The resulting
imputations are appropriate for fitting the identical model (or one that is nested within
the target model) in the frequentist framework. Fully conditional specification multiple
Blimp User’s Guide (Version 3) 102
imputation instead uses a round robin sequence of regression models, each of which
This example illustrates a fully conditional specification imputation routine that would
yield appropriate imputations for the linear regression model from Example 4.5 (or any
additive model that includes the variables listed on the FCS line). Note that fully
nonlinear effects, as it is prone to bias in such cases (Bartlett et al., 2015; Seaman,
Bartlett, & White, 2012). The model-based multiple imputation procedure illustrated in
Example 4.8 is a better option. Clicking the links below downloads the scripts and data
for this example, and the full set of User Guide examples is available from a pull-down
Ex4.6.imp data2.dat
DATA: data2.dat;
VARIABLES: id y1 y2 x1 d1 d2 n1 x2 n2;
ORDINAL: d1 d2;
NOMINAL: n1;
FIXED: x1 d2;
MISSING: 999;
FCS: y1 x1 d1 d2 n1 x2;
SEED: 90291;
BURN: 1000;
ITERATIONS: 1000;
CHAINS: 20;
NIMPS: 20;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
stacked = 'imps.dat'
imp# id y1 y2 x1 d1 d2 n1 x2 n2
The imputed data sets can be analyzed in other software packages. Example 4.4
This example illustrates how to add auxiliary variables to a regression model. Clicking
the links below downloads the Blimp scripts and data for this example, and the full set
of User Guide examples is available from a pull-down menu in the graphical interface..
The model analysis model features a continuous variable and dummy code as
predictors. The cgm superscript denotes variables centered at their grand means.
specification where analysis variables predict the auxiliary variables and auxiliary
variables predict each other in a cascading pattern (i.e., the first auxiliary predicts the
second, the first and second predict the third, and so on).
DATA: data3.dat;
VARIABLES: id x1 a1 a2 y d1 a3 v1:v4;
MISSING: 999;
ORDINAL: d1 a3;
FIXED: d1;
CENTER: x1;
MODEL:
# focal analysis model
y ~ x1 d1;
# auxiliary variable models
a1 ~ y x1 d1;
a2 ~ a1 y x1 d1;
a3 ~ a1 a2 y x1 d1;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
The script below illustrates a syntax shortcut that specifies the sequential specification
DATA: data3.dat;
VARIABLES: id x1 a1 a2 y d1 a3 v1:v4;
MISSING: 999;
ORDINAL: d1 a3;
FIXED: d1;
CENTER: x1;
MODEL:
# focal analysis model
y ~ x1 d1;
# auxiliary variable models
a3 a2 a1 ~ y x1 d1;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 106
Adding the NIMPS, CHAINS, and SAVE commands to the script creates model-based
multiple imputations that can be analyzed in the frequentist framework (see Example
4.4).
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
The model is as follows, and the cgm superscript denotes variables centered at their
grand means.
DATA: data4.dat;
VARIABLES: id a1:a3 y x1 x2 n1 d1 d2 o1:o19;
ORDINAL: d1;
NOMINAL: d2;
MISSING: 999;
FIXED: d2;
CENTER: x1 d1;
MODEL: y ~ x1 d2 x1*d2 d1;
SIMPLE: x1 | d2;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
Blimp can save multiple imputations from any model it estimates. The script below
analysis) for the linear moderated regression model. The new syntax features are as
follows.
❖ CENTER command grand mean centers predictors in the Bayesian output, but
saved imputations are on the original metric
❖ NIMPS command specifies 20 imputed data sets
❖ Setting CHAINS equal to NIMPS saves one data set from the final iteration of each
MCMC chain (avoids autocorrelated imputations)
❖ Imputations are stacked in a single file with an index variable added in the first
column
DATA: data4.dat;
VARIABLES: id a1:a3 y x1 x2 n1 d1 d2 o1:o19;
ORDINAL: d1;
NOMINAL: d2;
MISSING: 999;
Blimp User’s Guide (Version 3) 108
FIXED: d2;
CENTER: x1 d1;
MODEL: y ~ x1 d2 x1*d2 d1;
SIMPLE: x1 | d2;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
CHAINS: 20;
NIMPS: 20;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
stacked = 'imps.dat'
imp# id a1 a2 a3 y x1 x2 n1 d1 d2 o1 o2 o3 o4 o5 o6 o7 o8
o9 o10 o11 o12 o13 o14 o15 o16 o17 o18 o19
The imputed data sets can be analyzed in other software packages. For example, the
script below uses the R package mitml (Grund et al., 2021) to fit the moderated
regression model to the filled-in data sets. The resulting estimates are numerically
# center predictors
imps$x1.cgm <- imps$x1 - mean(imps$x1)
imps$d1.cgm <- imps$d1 - mean(imps$d1)
interactive effects because it is prone to bias. The moderated regression in Example 4.8
is an exception that could be handled by imputing the data separately within each
group of the complete moderator variable (Enders & Gottschall, 2011; Graham, 2009).
the data by subgroup and imputes within each strata. Clicking the links below
downloads the Blimp scripts and data for this example, and the full set of User Guide
❖ FCS command includes all analysis variables (other than the one listed on the
BYGROUP line) plus two auxiliary variables
❖ NIMPS command specifies 20 imputed data sets
❖ Setting CHAINS equal to NIMPS saves one data set from the final iteration of each
MCMC chain (avoids autocorrelated imputations)
❖ Imputations are stacked in a single file with an index variable added in the first
column
DATA: data4.dat;
VARIABLES: id a1:a3 y x1 x2 n1 d1 d2 o1:o19;
ORDINAL: d1;
MISSING: 999;
FIXED: d2;
BYGROUP: d2;
FCS: a1:a3 y x1 d1;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
CHAINS: 20;
NIMPS: 20;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
stacked = 'imps.dat'
imp# id a1 a2 a3 y x1 x2 n1 d1 d2 o1 o2 o3 o4 o5 o6 o7 o8
o9 o10 o11 o12 o13 o14 o15 o16 o17 o18 o19
Blimp User’s Guide (Version 3) 111
The imputed data sets can be analyzed in other software packages. The R script from
Example 4.8 fits a moderated regression model to the filled-in data sets from this run.
This example illustrates a curvilinear regression with a quadratic term and continuous
and binary covariates. Clicking the links below downloads the Blimp scripts and data
for this example, and the full set of User Guide examples is available from a pull-down
Ex4.10.imp data5.dat
The regression model is as follows, and the cgm superscript denotes variables
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
4.8).
DATA: data5.dat;
VARIABLES: id d1 d2 v1:v3 x1 x2 y;
MISSING: 999;
Blimp User’s Guide (Version 3) 112
ORDINAL: d1 d2;
FIXED: d1 x2;
CENTER: x1 x2;
MODEL: y2 ~ x1 (x1^2) x2 d1 d2;
SEED: 12345;
BURN: 1000;
ITERATIONS: 10000;
This example illustrates probit regression for a binary outcome. Clicking the links
below downloads the Blimp scripts and data for this example, and the full set of User
The model features a latent response variable regressed on continuous predictors and
a binary dummy code, and the cgm superscript denotes variables centered at their
grand means.
A single threshold value fixed at 0 is automatically included and does not require
specification. The syntax highlights are shown below, and adding the NIMPS and SAVE
Example 4.8).
DATA: data1.dat;
VARIABLES: id n1 y o1 x1 x2 d1 x3 x4;
ORDINAL: y d1;
MISSING: 999;
FIXED: d1;
CENTER: x1 x2;
MODEL:
y ~ x1 x2 d1;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
Blimp can also create auxiliary parameters that are functions of the estimated model
parameters. To illustrate, the following script uses parameter labels, built-in functions,
“case” at each level of the D1 dummy code (and at the means of the continuous
❖ MODEL command labels the intercept and the binary predictor’s slope
❖ PARAMETERS command defines news parameters that give the predicted
probability of a “success” (outcome = 1) at each level of the dummy code and the
group difference on the probability metric
DATA: data1.dat;
VARIABLES: id n1 y o1 x1 x2 d1 x3 x4;
ORDINAL: y d1;
MISSING: 999;
FIXED: d1;
CENTER: x1 x2;
MODEL:
Blimp User’s Guide (Version 3) 114
y ~ 1@b0 x1 x2 d1@b3;
PARAMETERS:
pp.d0 = phi(b0);
pp.d1 = phi(b0 + b3);
pp.diff = pp.d1 - pp.d0;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
This example illustrates a probit regression for an ordered categorical outcome with
seven response options (e.g., a Likert scale). Clicking the links below downloads the
Blimp scripts and data for this example, and the full set of User Guide examples is
Ex4.12.imp data1.dat
The model features a latent response variable regressed on continuous predictors and
a binary dummy code, and the cgm superscript denotes variables centered at their
grand means.
Six threshold parameters that divide the latent response distribution into seven bins
are automatically included and do not require specification (the lowest is fixed at 0 for
identification). The syntax highlights are shown below, and adding the NIMPS and
DATA: data1.dat;
VARIABLES: id n1 n2 y x1 x2 d1 x3 x4;
ORDINAL: y d1;
MISSING: 999;
FIXED: d1;
CENTER: x1 x2;
MODEL:
y ~ x1 x2 d1;
SEED: 90291;
BURN: 20000;
ITERATIONS: 10000;
This example illustrates logistic regression for a binary outcome. Clicking the links
below downloads the Blimp scripts and data for this example, and the full set of User
The model features a binary outcome regressed on continuous predictors and a binary
dummy code, and the cgm superscript denotes variables centered at their grand
means.
Blimp User’s Guide (Version 3) 116
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
4.8). When saving imputations, adding the savepredicted keyword to the OPTIONS
DATA: data1.dat;
VARIABLES: id n1 y o1 x1 x2 d1 x3 x4;
ORDINAL: y d1;
MISSING: 999;
FIXED: d1;
CENTER: x1 x2;
MODEL:
logit(y) ~ x1 x2 d1;
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
Blimp can also create auxiliary parameters that are functions of the estimated model
parameters. To illustrate, the following script uses parameter labels, built-in functions,
“case” at each level of the D1 dummy code (and at the means of the continuous
❖ MODEL command labels the intercept and the binary predictor’s slope
❖ PARAMETERS command defines news parameters that give the predicted
probability of a “success” (outcome = 1) at each level of the dummy code and the
group difference on the probability metric
DATA: data1.dat;
VARIABLES: id n1 y o1 x1 x2 d1 x3 x4;
ORDINAL: y d1;
MISSING: 999;
FIXED: d1;
CENTER: x1 x2;
MODEL:
logit(y) ~ 1@b0 x1 x2 d1@b3;
PARAMETERS:
pp.d0 = exp(b0) / (1 + exp(b0));
pp.d1 = exp(b0 + b3) / (1 + exp(b0 + b3));
This example illustrates logistic regression for a multicategorical outcome with three
levels. Clicking the links below downloads the Blimp scripts and data for this example,
and the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex4.14.imp data4.dat
predictors, with the lowest numeric code (e.g., Y = 1) as the reference group. The cgm
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
4.8).
DATA: data4.dat;
VARIABLES: id x1:x6 y d1 d2 o1:o19;
ORDINAL: d1;
NOMINAL: y;
MISSING: 999;
FIXED: x2 x3;
CENTER: x1 x2 x3;
MODEL: y ~ x1 x2 x3;
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
This feature is currently under development and will be added in a future update.
Blimp User’s Guide (Version 3) 119
This example illustrates a regression analysis that features a 6-item sum (scale) score
as the outcome, a 7-item sum score as a predictor, and two binary covariates. The
ordered categorical (e.g., questionnaire) items that determine the sum are incomplete.
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
where X is the scale (sum) score, and X1 to X7 are its ordinal components. It is
important to treat missing data at the item level when analyzing incomplete composite
scores, as doing so maximizes power and precision. This example illustrates the
approach from Alacam, Du, Enders, and Keller (2021) and Enders (2022). The syntax
highlights are shown below, and adding the NIMPS and SAVE commands generates
DATA: data4.dat;
VARIABLES: id a1 a2 a3 yscale xscale zscale n1 d1 d2
y1:y6 x1:x7 z1:z6;
ORDINAL: x1:x7 d1 d2;
MISSING: 999;
MODEL:
# sequential specification for x scale items and dummy codes
x1:x7 d1 d2 ~ 1;
# scale score predictor using an embedded function
yscale ~ x1:+:x7 d1 d2;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
The previous script used a composite score as the dependent variable but did not
incorporate the dependent variable’s component items into the model. Doing so would
improve precision because the items are strong correlates of the sum score. The code
block below leverages item-level correlations by introducing five of the six outcome
items as auxiliary variables (Eekhout et al., 2015). The component items are added
using the same auxiliary variable approach from Example 4.7. The additional syntax
DATA: data4.dat;
Blimp User’s Guide (Version 3) 121
sum score predictor and binary moderator. Clicking the links below downloads the
Blimp scripts and data for this example, and the full set of User Guide examples is
Ex4.17.imp data4.dat
where X is the scale (sum) score, and X1 to X7 are its ordinal components. It is
important to treat missing data at the item level when analyzing incomplete composite
scores, as doing so maximizes power and precision. This example illustrates the
Blimp User’s Guide (Version 3) 122
approach from Keller (2022) and Enders (2022). The syntax highlights are shown
below, and adding the NIMPS and SAVE commands generates model-based multiple
DATA: data4.dat;
VARIABLES: id a1 a2 a3 yscale xscale zscale n1 d1 d2
y1:y6 x1:x7 z1:z6;
ORDINAL: y1:y5 x1:x7 d1 d2;
MISSING: 999;
MODEL:
# sequential specification for x scale items and dummy codes
x1:x7 d1 d2 ~ 1;
# scale score predictor and interaction with embedded function
yscale ~ x1:+:x7 d2 (d2 * ( x1:+:x7 )) d1;
# sequential specification for y scale items
y1:y5 ~ yscale;
SEED: 90291;
BURN: 20000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 123
This example illustrates a Yeo-Johnson (Yeo & Johnson, 2000) transformation that
samples imputations from a skewed distribution. Clicking the links below downloads
the Blimp scripts and data for this example, and the full set of User Guide examples is
Ex4.18.imp data6.dat
The analysis model is a logistic regression with two continuous variables and two
X2’s distribution is markedly peaked and positively skewed, and drawing imputations
The Yeo-Johnson procedure estimates the variable’s shape and draws imputations
the predictor variable, such that the resulting linear regression reflects associations
between the normalized variable and other predictors. The sequential specification in
the code block below invokes the following regression equation for the normalized
predictor.
However, skewed imputations on the raw score metric always appear on the right side
of any regression equation (e.g., the focal regression model). Normalized imputations
The Yeo–Johnson transformation can be very slow (or fail) to converge if the skewed
variable’s mean is far from zero. To facilitate interpretation, the code block below
centers the predictor scores at the median value of 16. Additional details about the
procedure are available in the literature (Enders, 2022; Lüdtke et al., 2020b). The
syntax highlights are shown below, and adding the NIMPS and SAVE commands
4.8).
DATA: data6.dat;
VARIABLES: id d1 x1 n1 d2 a1 x2 x3 x4 y;
ORDINAL: y d1 d2;
MISSING: 999;
FIXED: d1 x1;
MODEL:
# sequential predictor models with yeo-johnson transform for x2
yjt(x2 - 16) ~ x1 d1;
d2 ~ x2 x1 d1;
Blimp User’s Guide (Version 3) 125
# focal model;
logit(y) ~ x1 x2 d1 d2;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
variable. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
The untransformed analysis model features two continuous variables and one binary
dummy code as predictors, where the cgm superscript denotes variables centered at
Applying the Yeo-Johnson transformation normalizes the dependent variable, such that
the resulting linear regression reflects associations between the normalized outcome
OPTIONS line. The Yeo–Johnson transformation can be very slow (or fail) to converge if
the skewed variable’s mean is far from zero. To facilitate interpretation, the code block
Blimp User’s Guide (Version 3) 126
below centers the outcome at the median value of 9. Additional details about the
procedure are available in the literature (Enders, in press; Lüdtke et al., 2020b).
DATA: data2.dat;
VARIABLES: id y n1 x1 d1 d2 n2 x2 n3;
ORDINAL: d1;
MISSING: 999;
FIXED: x1;
CENTER: x1 x2;
MODEL:
yjt(y - 9) ~ x1 x2 d1;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
Blimp can save multiple imputations from any model it estimates. Adding the NIMPS
analysis, and listing the savelatent keyword on the OPTIONS command saves the
imputes on the raw score metric (this keyword also saves the latent response scores
DATA: data2.dat;
VARIABLES: id y n1 x1 d1 d2 n2 x2 n3;
ORDINAL: d1;
MISSING: 999;
FIXED: x1;
CENTER: x1 x2;
MODEL:
yjt(y - 9) ~ x1 x2 d1;
SEED: 90291;
BURN: 3000;
ITERATIONS: 10000;
# save model-based multiple imputations;
CHAINS: 20;
NIMPS: 20;
OPTIONS: savelatent;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
stacked = 'imps.dat'
imp# id y n1 x1 d1 d2 n2 x2 n3 yjt(yjt(y-9)) d1.latent
The variable y contains skewed imputations on the raw score metric, and the variable
yjt(yjt(y-9)) contains the normalized imputes. The imputed data sets can be
analyzed in other software packages. To illustrate, the script below uses the R package
Blimp User’s Guide (Version 3) 128
mitml (Grund et al., 2021) to fit the regression model to the filled-in data sets. The
positively skewed raw score imputations are on the original metric, whereas the
# center predictors
imps$x1.cgm <- imps$x1 - mean(imps$x1)
imps$x2.cgm <- imps$x2 - mean(imps$x2)
This example illustrates the linear regression analysis from Eample 4.3 with the
Bayesian Wald test described by Asparouhov and Muthén (2021). Clicking the links
Blimp User’s Guide (Version 3) 129
below downloads the Blimp scripts and data for this example, and the full set of User
Ex4.20.imp data1.dat
The model features a pair of continuous predictors and a binary dummy code, where
DATA: data1.dat;
VARIABLES: id n1 d1 o1 y x1 d2 x2 x3;
ORDINAL: d2;
MISSING: 999;
FIXED: d2;
CENTER: x1 x2;
MODEL: y ~ x1@b1 x2@b2 d2@b3;
TEST:
b1:b3 = 0;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 130
The TEST command produces the output table below. The Wald test statistic is a
chi-square variable, and the test’s degrees of freedom equals the number of
parameters by which the two models differ. The probability value is not a frequentist
p-value because it makes no reference to test statistics from other random samples.
Rather, the probability is an index of support for the proposed constraints, where
support is defined as the area above the test statistic value in a chi-square distribution.
MODEL FIT:
Test #1
This example illustrates propensity score estimation with missing data. Clicking the
links below downloads the Blimp scripts and data for this example, and the full set of
User Guide examples is available from a pull-down menu in the graphical interface..
Ex4.21.imp data4.dat
The focal model features a binary dummy code (the “treatment” indicator) predicting a
continuous outcome.
Blimp User’s Guide (Version 3) 131
The propensity score model features the treatment indicator regressed on potential
Because the treatment indicator D1 consists of naturally occurring groups, this variable
could be incomplete, which it is here. In this case, it is important for propensity score
DATA: data4.dat;
Blimp User’s Guide (Version 3) 132
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
stacked = 'imps.dat'
imp# id x1 x2 x3 x4 y x5 n1 d1 d2 o1 o2 o3 o4 o5 o6 o7 o8
o9 o10 o11 o12 o13 o14 o15 o16 o17 o18 o19
d1.probability y.predicted
examples, imputed data sets from Blimp can be analyzed in other software packages.
Blimp User’s Guide (Version 3) 133
This section illustrates path analyses and latent variable models in Blimp. These
is possible to mix and match features from any examples to easily create complex
analysis models that honor features of the data. Additional details about fitting path
and latent variable models in Blimp can be found in Keller (2022), which is available
Following the previous chapter, the examples in this section use a generic notation
system where variable names usually consist of an alphanumeric prefix and a numeric
suffix (e.g., Y1, X1, N1, D1, D2, V1, V2, V3). The letter Y designates a dependent variable,
represent continuous variables. Finally, the model equations use a “cgm” superscript to
indicate grand mean centering. The following list outlines the examples in this section.
This example illustrates a single-mediator path model. The regression models are
shown below
where α and β are slope coefficients that define the indirect effect or product of the
analysis is shown below. The model also incorporates three auxiliary variables
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Blimp User’s Guide (Version 3) 135
Ex5.1.imp data4.dat
DATA: data4.dat;
VARIABLES: id a1:a3 zscale yscale mscale n1 x d1 o1:o19;
MISSING: 999;
MODEL:
# single-mediator model with parameter labels
mscale ~ x@alpha;
yscale ~ mscale@beta x;
# sequential specification for auxiliary variables
a1:a3 ~ yscale mscale x;
PARAMETERS:
indirect = alpha * beta;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
This example adds moderated pathways to the single-mediator model from the
The dashed lines pointing from D to the directed arrows convey that D moderates the
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex5.2.imp data4.dat
DATA: data4.dat;
VARIABLES: id a1 a2 a3 zscale yscale mscale n1 x d o1:o19;
ORDINAL: d;
MISSING: 999;
FIXED: d;
MODEL:
# single-mediator model with moderated a and b paths
mscale ~ x@alpha d x*d@alphamod;
yscale ~ mscale@beta x d mscale*d@betamod;
# sequential specification for auxiliary variables
a1:a3 ~ yscale mscale x d;
PARAMETERS:
indirect.d0 = alpha * beta;
indirect.d1 = ( alpha + alphamod ) * ( beta + betamod );
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
where Y * denotes the underlying latent response variable for a binary outcome Y, and
all other features of the model are the same as Example 5.1. A path diagram of the
mediation model is shown below, with the ellipse denoting the latent response
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
DATA: data20.dat;
VARIABLES: id a1:a3 zscale y mscale n1 x d1 o1:o19;
MISSING: 999;
Blimp User’s Guide (Version 3) 139
ORDINAL: y;
MODEL:
# single-mediator model with parameter labels
mscale ~ x@alpha;
y ~ mscale@beta x;
# sequential specification for auxiliary variables
a1:a3 ~ y mscale x;
PARAMETERS:
indirect = alpha * beta;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
The script above defines the binary outcome as a latent response variable (i.e., probit
regression). Applying the logit function to the dependent variable on the MODEL line
DATA: data20.dat;
VARIABLES: id a1:a3 zscale y mscale n1 x d1 o1:o19;
MISSING: 999;
ORDINAL: y;
MODEL:
# single-mediator model with parameter labels
mscale ~ x@alpha;
logit(y) ~ mscale@beta x;
# sequential specification for auxiliary variables
a1:a3 ~ y mscale x;
PARAMETERS:
indirect = alpha * beta;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 140
mediator (the mediator could also be binary). The regression models are shown below
where α and β are slope coefficients that define the indirect effect or product of the
analysis is shown below. The model also incorporates three auxiliary variables
When M is binary or ordinal, the α path represents the regression of a latent response
leading to an awkward situation where M essentially has two different metrics within
the same model (i.e., M is latent when it is an outcome variable but ordinal when it is a
predictor). Alternatively, Blimp can use the latent response variable in both
straightforward linear regression with latent response variables. This idea was
Blimp User’s Guide (Version 3) 141
proposed in Muthén, Muthén, and Asparouhov (2016). Clicking the links below
downloads the Blimp scripts and data for this example, and the full set of User Guide
Ex5.4.imp data4.dat
DATA: data4.dat;
VARIABLES: id a1:a3 zscale yscale mscale n1 x d1 o1:o18 m;
MISSING: 999;
ORDINAL: m;
MODEL:
m ~ x@alpha;
# m’s latent response variable as a predictor
yscale ~ m.latent@beta x;
# sequential specification for auxiliary variables
a1:a3 ~ yscale m.latent x;
PARAMETERS:
indirect = alpha * beta;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 142
variables, each measured by six continuous indicators. A path diagram of the analysis
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex5.5.imp data4.dat
DATA: data4.dat;
VARIABLES: id a1 a2 a3 yscale mscale xscale n1 d1 d2
y1:y6 m1:m7 x1:x6;
MISSING: 999;
LATENT: latenty latentx;
MODEL:
latentx -> x1:x6;
latenty -> y1:y6;
latentx <-> latenty;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
and IRT scaling. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
A path diagram of the analysis model is shown below, with ellipses denoting latent
response variables, the residual variances of which are fixed scaling constants.
Blimp User’s Guide (Version 3) 144
Blimp can use either a logit or probit link. The syntax highlights for the logistic link are
as follows.
DATA: data14.dat;
VARIABLES: id y1:y6;
ORDINAL: y1:y6;
MISSING: 999;
LATENT: ability;
MODEL:
ability ~ 1@0;
ability ~~ ability@1;
Blimp User’s Guide (Version 3) 145
The script below is identical but uses a probit rather than logit link (i.e., a normal ogive
DATA: data14.dat;
VARIABLES: id y1:y6;
ORDINAL: y1:y6;
MISSING: 999;
LATENT: ability;
MODEL:
ability ~ 1@0;
ability ~~ ability@1;
y1 ~ 1@icept1 ability@load1;
y2 ~ 1@icept2 ability@load2;
y3 ~ 1@icept3 ability@load3;
Blimp User’s Guide (Version 3) 146
y4 ~ 1@icept4 ability@load4;
y5 ~ 1@icept5 ability@load5;
y6 ~ 1@icept6 ability@load6;
PARAMETERS:
discrim1 = load1;
discrim2 = load2;
discrim3 = load3;
discrim4 = load4;
discrim5 = load5;
discrim6 = load6;
difficulty1 = - icept1 / load1;
difficulty2 = - icept2 / load2;
difficulty3 = - icept3 / load3;
difficulty4 = - icept4 / load4;
difficulty5 = - icept5 / load5;
difficulty6 = - icept6 / load6;
SEED: 90291;
BURN: 3000;
ITERATIONS: 10000;
variables, each measured by six ordinal indicators. A path diagram of the analysis
model is shown below, with ellipses denoting latent response variables, the residual
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex5.7.imp data4.dat
DATA: data4.dat;
VARIABLES: id a1 a2 a3 yscale mscale xscale n1 d1 d2
y1:y6 m1:m7 x1:x6;
ORDINAL: x1:x6 y1:y6;
MISSING: 999;
LATENT: latenty latentx;
MODEL:
latentx -> x1:x6;
latenty -> y1:y6;
latentx <-> latenty;
SEED: 90291;
BURN: 50000;
ITERATIONS: 10000;
Examples 5.5 and 5.6 illustrated item-level factor analyses that imposed a
variable imputation scheme from Enders (2022) that creates multiple imputation data
sets containing categorical items as well as their underlying latent response variables
(i.e., plausible values). The goal is to convert a categorical factor analysis problem into
a normal-theory multiple imputation analysis that uses the latent response scores as
indicators in lieu of discrete items. Clicking the links below downloads the Blimp
scripts and data for this example, and the full set of User Guide examples is available
DATA: data4.dat;
VARIABLES: id a1 a2 a3 yscale mscale xscale d1 d2
y1:y6 m1:m7 x1:x6;
ORDINAL: x1:x6 y1:y6;
MISSING: 999;
FCS: x1:x6 y1:y6;
SEED: 90291;
BURN: 25000;
ITERATIONS: 10000;
CHAINS: 100;
NIMPS: 100;
OPTIONS: savelatent;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
whether they were imputed. The latent response variables have a .latent suffix
stacked = 'imps.dat'
the imputed latent response scores using other software packages. A path diagram is
as follows.
Blimp User’s Guide (Version 3) 151
The code block below uses the R packages mitml (Grund, Robitzsch, & Lüdke, 2021),
model to the latent normal imputations. The resulting estimates are numerically
probit link to the categorical data, but the FIML analysis often doesn’t provide fit
# load packages
library(semTools)
library(lavaan)
library(mitml)
This example illustrates a two-factor model with correlated latent variables, each
measured by three continuous indicators. One indicator from each latent factor is
applied to these indicators. A path diagram of the analysis model is shown below.
The ellipses indicate normalized indicators, which are essentially latent normal
variables that have a nonlinear mapping to the nonnormal manifest variables. Clicking
the links below downloads the Blimp scripts and data for this example, and the full set
of User Guide examples is available from a pull-down menu in the graphical interface..
DATA: data12.dat;
VARIABLES: x1:x3 y1:y3;
MISSING: 999;
LATENT: latentx latenty;
MODEL:
latentx <-> latenty;
latentx ~ 1@0;
x1 ~ latentx@1;
yjt(x2) ~ latentx;
x3 ~ latentx;
latenty ~ 1@0;
yjt(y1) ~ latenty;
y2 ~ latenty@1;
y3 ~ latenty;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
OPTIONS: savelatent;
Blimp User’s Guide (Version 3) 154
CHAINS: 20;
NIMPS: 20;
SAVE: stacked = imps.dat;
Blimp can save multiple imputations from any model it estimates. In addition to
producing Bayesian estimates of the factor model parameters, the previous code block
saves normalized imputations for a frequentist analysis. Blimp lists the order of the
variables in the imputed data sets at the bottom of the output file, and all variables in
the input file appear in the output file regardless of whether they were imputed. The
stacked = 'imps.dat'
the original and normalized variables using other software packages. For example, the
script shown in the code block below uses the R packages mitml (Grund, Robitzsch, &
Lüdke, 2021), lavaan (Rosseel, Jorgensen, & Rockwood, 2021), and semTools
measurement model.
# load packages
library(semTools)
library(lavaan)
library(mitml)
Blimp User’s Guide (Version 3) 155
This example illustrates a latent variable mediation model where both the mediator
and outcome are latent variables, each with six ordinal indicators. The structural
The residual variances of all latent response variances are fixed at values of 1, and
mediated pathways can be computed following Example 5.1. Clicking the links below
downloads the Blimp scripts and data for this example, and the full set of User Guide
Ex5.10.imp data4.dat
❖ Default specification fixes the first loading of each factor to 1 and sets the latent
means equal to 0
❖ Longer burn-in period for estimating latent variables and threshold parameters
DATA: data4.dat;
VARIABLES: id a1 a2 a3 yscale mscale xscale n1 d1 d2
y1:y6 m1:m7 x1:x6;
ORDINAL: d2 y1:y6 m1:m7;
MISSING: 999;
FIXED: d2;
LATENT: latenty latentm;
MODEL:
# structural model
latentm ~ xscale d2;
latenty ~ latentm xscale d2;
# measurement model
latentm -> m1:m6;
latenty -> y1:y6;
SEED: 90291;
BURN: 50000;
ITERATIONS: 10000;
This example adds moderated paths to the latent variable mediation model from the
two manifest variables and an interaction between a manifest and latent variable.
The dashed lines pointing from D2 to the directed arrows convey that D2 moderates the
association between X and the latent mediator as well as the association between the
latent mediator and the outcome. The residual variances of all latent response
variances are fixed at values of 1, and mediated pathways can be computed following
Example 5.2. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
Ex5.11.imp data4.dat
DATA: data4.dat;
VARIABLES: id a1 a2 a3 xscale zscale yscale n1 d1 d2 x1:x6
z1:z7 y1:y6;
ORDINAL: d2;
MISSING: 999;
LATENT: latentx latenty;
MODEL:
# structural model
latentm ~ xscale d2 xscale*d2;
latenty ~ latentm xscale d2 latentm*d2;
# measurement model
latentm -> m1:m6;
latenty -> y1:y6;
SEED: 90291;
BURN: 25000;
ITERATIONS: 10000;
This example illustrates a latent variable regression model with two latent predictors
and their interaction influencing a latent outcome variable. The structural regression
equation is as follows.
Blimp User’s Guide (Version 3) 160
The dashed line pointing from the latent variable to the directed arrow conveys that
one latent predictor is moderating the influence of the other. Clicking the links below
downloads the Blimp scripts and data for this example, and the full set of User Guide
Ex5.12.imp data13.dat
❖ MODEL command labels the latent variable variances and structural regression
slopes
❖ PARAMETERS command uses labeled quantities to compute conditional effects
(simple slopes) at plus and minus one standard deviation above the latent
moderator’s mean
❖ Longer burn-in period for estimating latent variables
DATA: data13.dat;
VARIABLES: x1:x3 m1:m3 y1:y3;
MISSING: 999;
LATENT: latentx latentm latenty;
MODEL:
# label factor variances for simple slopes
latentx ~~ latentx@xvar;
latentm ~~ latentm@mvar;
# measurement models
latentx -> x1:x3;
latentm -> m1:m3;
latenty -> y1:y3;
# latent correlation
latentx <-> latentm;
# regression model with interaction
latenty ~ latentx@b1 latentm@b2 latentx*latentm@b3;
PARAMETERS:
xslp.mlo = b1 - b3 * sqrt(mvar);
xslp.mhi = b1 + b3 * sqrt(mvar);
mslp.xlo = b2 - b3 * sqrt(xvar);
mslp.xhi = b2 + b3 * sqrt(xvar);
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 162
where a binary grouping variable exerts direct effects on factor model indicators and it
interacts with the latent factor to produce group-specific factor loadings (Bauer, 2017).
The straight lines from G to the indicators introduce group differences in measurement
intercepts, and the dashed lines from G to the directed arrows reflect
example. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
DATA: data4.dat;
VARIABLES: id a1 a2 a3 yscale mscale xscale n1 d1 g
y1:y6 m1:m7 x1:x6;
ORDINAL: g;
MISSING: 999;
LATENT: latenty;
MODEL:
# structural model
latenty ~~ g;
Blimp User’s Guide (Version 3) 164
latenty ~ 1@0;
latenty ~~ latenty@1;
# measurement model
y1 ~ latenty;
y2 ~ g latenty g*latenty;
y3 ~ g latenty g*latenty;
y4 ~ g latenty g*latenty;
y5 ~ g latenty g*latenty;
y6 ~ g latenty g*latenty;
SIMPLE:
latenty | g;
SEED: 90291;
BURN: 25000;
ITERATIONS: 10000;
OPTIONS: savelatent;
CHAINS: 20;
NIMPS: 20;
SAVE: stacked = imps.dat;
Blimp can save multiple imputations from any model it estimates. In addition to
producing Bayesian estimates of the factor model parameters, the previous code block
saves imputations for a frequentist multiple-group analysis. Blimp lists the order of the
variables in the imputed data sets at the bottom of the output file, and all variables in
the input file appear in the output file regardless of whether they were imputed.
stacked = 'imps.dat'
In the analysis phase, a multiple-group factor analysis is fit to the imputed data. For
example, the script shown in the code block below uses the R packages mitml (Grund,
Robitzsch, & Lüdke, 2021), lavaan (Rosseel, Jorgensen, & Rockwood, 2021), and
TBA
This example illustrates a two-factor latent growth curve model with unequally
spaced repeated measurements and a binary predictor of the random intercepts and
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex5.14.imp data3.dat
DATA: data3.dat;
VARIABLES: id y0 y1 y3 y6 d1 d2 v1:v4;
ORDINAL: d1;
MISSING: 999;
FIXED: d1;
LATENT: icept growth;
MODEL:
# structural model
icept ~ 1 d1;
growth ~ 1 d1;
icept <-> growth;
# measurement model
Blimp User’s Guide (Version 3) 167
This section illustrates multilevel models in Blimp. In general, it is possible to mix and
match features from any examples to create complex analysis models that honor
features of the data. Following the previous chapter, the examples in this section use a
prefix and a numeric suffix (e.g., Y1, X1, N1, D1, D2, V1, V2, V3). The letter Y designates a
letters generally represent continuous variables. Additionally, the examples use a “.i”
suffix to denote level-1 variables, “.j” for level-2 variables, and “.k” for level-3
measured at level-1). Blimp determines the levels automatically, so the suffixes are
meant as a visual aid for understanding the scripts. Finally, the model equations use
“cgm” and “cwc” superscripts to indicate grand and group mean centering, respectively.
This example illustrates a two-level regression model with random intercepts. The
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex6.1.imp data7.dat
DATA: data7.dat;
VARIABLES: level1id level2id n1.i d1.i d2.i n2.i x1.i x2.i
x3.i x4.i y.i d3.j x5.j x6.j;
CLUSTERID: level2id;
ORDINAL: d2.i d3.j;
MISSING: 999;
FIXED: x4.i d3.j;
CENTER: grandmean = x1.i x4.i d2.i x5.j;
MODEL: y.i ~ x1.i x4.i d2.i x5.j d3.j;
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
an approach to getting frequentist inference for the analysis from Example 6.1. The
known to induce bias (Enders et al., 2020; Grund, Lüdke, & Robitzsch, 2016).
6.3). Clicking the links below downloads the Blimp scripts and data for this example,
and the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Blimp User’s Guide (Version 3) 171
DATA: data7.dat;
VARIABLES: level1id level2id n1.i d1.i d2.i n2.i x1.i x2.i
x3.i x4.i y.i d3.j x5.j x6.j;
CLUSTERID: level2id;
ORDINAL: d2.i d3.j;
MISSING: 999;
FCS: y.i x1.i x4.i d2.i x5.j d3.j;
SEED: 90291;
BURN: 25000;
ITERATIONS: 10000;
CHAINS: 20;
NIMPS: 20;
SAVE: stacked = imps.dat;
Blimp User’s Guide (Version 3) 172
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
stacked = 'imps.dat'
imp# level1id level2id n1.i d1.i d2.i n2.i x1.i x2.i x3.i
x4.i y.i d3.j x5.j x6.j
The imputed data sets can be analyzed in other software packages. The script below
uses the R packages lme4 (Bates et al., 2021) and mitml (Grund et al., 2021) to fit the
multilevel regression model to the filled-in data sets. The resulting estimates are
# center predictors
imps$x1.i.cgm <- imps$x1.i - mean(imps$x1.i)
imps$x4.i.cgm <- imps$x4.i - mean(imps$x4.i)
imps$d2.i.cgm <- imps$d2.i - mean(imps$d2.i)
imps$x5.j.cgm <- imps$x5.j - mean(imps$x5.j)
This example illustrates a two-level regression model with random intercepts and
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
Blimp User’s Guide (Version 3) 174
ORDINAL: d1.j;
MISSING: 999;
FIXED: d1.j;
CENTER:
groupmean = x1.i;
grandmean = x2.i x7.j d1.j;
MODEL: y.i ~ x1.i x2.i x7.j d1.j | x1.i;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
Blimp can save multiple imputations from any model it estimates. Model-based
multiple imputations can be saved for a frequentist analysis by adding the SAVE and
❖ CENTER command grand mean centers predictors in the Bayesian output, but
saved imputations are on the original metric
❖ NIMPS command specifies 20 imputed data sets
❖ Setting CHAINS equal to NIMPS saves one data set from the final iteration of each
MCMC chain (avoids autocorrelated imputations)
❖ savelatent keyword on the OPTIONS line saves the latent group means of the
level-1 predictors and the analysis model’s random intercept and random slope
residuals
❖ Imputations are stacked in a single file with an index variable added in the first
column
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
ORDINAL: d1.j;
Blimp User’s Guide (Version 3) 175
MISSING: 999;
FIXED: d1.j;
CENTER:
groupmean = x1.i;
grandmean = x2.i x7.j d1.j;
MODEL: y.i ~ x1.i x2.i x7.j d1.j | x1.i;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
CHAINS: 20;
NIMPS: 20;
OPTIONS: savelatent;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
whether they were imputed. The savelatent keyword also saves the latent group
means of any level-1 predictors, and these can be used to center variables prior to
analyzing the imputations. This example uses X1’s latent group means, which are
stacked = 'imps.dat'
imp# level1id level2id x1.i x2.i y.i x3.i x4.i d1.j n1.j
x5.j x6.j x7.j x8.j x9.j y.i[level2id] y.i$x1.i[level2id]
x1.i.mean[level2id] x2.i.mean[level2id]
The imputed data sets can be analyzed in other software packages. The script below
uses the R packages lme4 (Bates et al., 2021) and mitml (Grund et al., 2021) to fit the
Blimp User’s Guide (Version 3) 176
multilevel regression model to the filled-in data sets. The resulting estimates are
This example illustrates how to examine the influence of different prior distributions on
the level-2 covariance matrix of the random effects. The analysis model is the
example. Blimp offers three “off-the-shelf” inverse Wishart priors for the covariance
Blimp User’s Guide (Version 3) 177
matrix, and it is also possible to use a so-called separation strategy that applies
distinct priors to variances and the intercept-slope correlation. Clicking the links below
downloads the Blimp scripts and data for this example, and the full set of User Guide
Ex6.4d.separation.imp data8.dat
Considering the inverse Wishart options, the default prior2 setting is less informative
because it subtracts the number of dimensions plus 1 from the degrees of freedom,
and it adds nothing to the sum of squares and cross-products; prior1 is more
freedom, and it adds an identity matrix to the sum of squares and cross-products;
prior3 adds zero degrees of freedom and adds zero to the sums of squares. The code
block below shows the default specification, the syntax highlights for which are as
follows.
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
Blimp User’s Guide (Version 3) 178
CLUSTERID: level2id;
MISSING: 999;
CENTER:
groupmean = x1.i;
grandmean = x2.i;
MODEL: y.i ~ x1.i x2.i | x1.i;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
OPTIONS: prior2;
Similarly, the code block below shows the specification for the more informative
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
MISSING: 999;
CENTER:
groupmean = x1.i;
grandmean = x2.i;
MODEL: y.i ~ x1.i x2.i | x1.i;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
OPTIONS: prior1;
Comparing the magnitude of the point estimates provides a gauge about the prior
distribution’s impact. The output table for the default prior2 specification is shown
immediately below, and the second output table shows the results from the more
# prior2
---------------------------------------------
Variances:
L2 : Var(Intercept) 0.613 0.083 0.478 0.804
L2 : Cov(x1.i,Intercept) 0.016 0.016 -0.014 0.048
L2 : Var(x1.i) 0.020 0.006 0.010 0.034
Residual Var. 0.358 0.011 0.337 0.381
...
---------------------------------------------
# prior1
---------------------------------------------
Variances:
L2 : Var(Intercept) 0.591 0.078 0.464 0.771
L2 : Cov(x1.i,Intercept) 0.017 0.018 -0.017 0.053
L2 : Var(x1.i) 0.039 0.007 0.028 0.056
Residual Var. 0.354 0.011 0.333 0.377
...
---------------------------------------------
Blimp User’s Guide (Version 3) 180
The default prior 2’s random slope variance is roughly half as large as that of the more
informative prior (0.020 vs. 0.039), and the two estimates differed by about 2.7
posterior standard deviation units (a very large difference). As a proportion of the total
variance, the R2 effect sizes attributable to the random slopes (Rights & Sterba, 2019)
The separation strategy (Barnard, McCulloch, & Meng, 2000; Liu, Zhang, & Grimm,
2016) assigns distinct priors to the diagonal and off-diagonal elements of the
the random slopes as a level-2 latent variable that correlates with the analysis model’s
random intercepts. This strategy assigns separate inverse gamma priors to the random
intercept and slope variances, and it specifies a beta prior distribution to their
correlation (Merkle and Rosseel, 2018). Computer simulation studies suggest that the
although the correlation estimate may be attenuated when the number of level-2 units
is small (Keller & Enders, 2021). The unique syntax highlights for the code block are as
follows.
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
MISSING: 999;
LATENT: level2id = beta1j;
CENTER:
groupmean = x1.i;
grandmean = x2.i;
MODEL:
y.i ~ x1.i x2.i x1.i*beta1j@1;
beta1j ~ 1@0;
y.i[level2id] ~~ beta1j;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
The random effect parameter estimates no longer appear on the same table when
employing the separation strategy because the random slope is a separate latent
variable with its own equation. The analysis model table shows the random intercept
variance, and the level-2 latent variable’s (random slope) variance and correlation
# separation strategy
---------------------------------------------
Variances:
Residual Var. 0.017 0.005 0.009 0.029
---------------------------------------------
...
---------------------------------------------
Variances:
L2 : Var(Intercept) 0.603 0.080 0.471 0.784
Residual Var. 0.359 0.011 0.337 0.382
...
---------------------------------------------
...
Correlations:
---------------------------------------------
beta1j <-> y.i[level2id] 0.001 0.030 -0.059 0.068
---------------------------------------------
This example illustrates how to inspect the level-1 and level-2 residuals (random
effects) from a two-level regression model with random intercepts and random slopes.
The analysis model, shown below, is the same as the one from Example 6.4.
Blimp User’s Guide (Version 3) 183
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j
n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
MISSING: 999;
CENTER:
Blimp User’s Guide (Version 3) 184
groupmean = x1.i;
grandmean = x2.i;
MODEL: y.i ~ x1.i x2.i | x1.i;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
CHAINS: 20;
NIMPS: 20;
OPTIONS: savelatent saveresidual;
SAVE: stacked = imps.dat;
Blimp lists the order of the variables in the imputed data sets at the bottom of the
output file, and all variables in the input file appear in the output file regardless of
whether they were imputed. The latent group means, random effects, and level-1
residuals are appended to the end of the file. Latent group means are designated by
appending the level-2 identifier in square brackets to the end of a predictor variable’s
model’s random intercepts are denoted by appending the level-2 identifier in square
slope residuals are indicated by joining the outcome and random predictor variables
y.i.residual).
stacked = 'imps.dat'
imp# level1id level2id x1.i x2.i y.i x3.i x4.i d1.j n1.j
x5.j x6.j x7.j x8.j x9.j y.i[level2id]
Blimp User’s Guide (Version 3) 185
y.i$x1.i[level2id] x1.i.mean[level2id]
x2.i.mean[level2id] y.i.residual
The imputed data sets can be analyzed in other software packages. The script below
uses the R package rockchalk (Johnson, 2019) to compute excess skewness and
# qq plots
Blimp User’s Guide (Version 3) 186
qqnorm(imps$y.ranicept); qqline(imps$y.ranicept)
qqnorm(imps$x1.ranslope); qqline(imps$x1.ranslope)
This example illustrates a two-level regression model with random intercepts and
slopes and heterogeneous within-cluster variances. The analysis model below is the
same one as Example 6.3, but the variance of the within-cluster residuals differs aross
clusters.
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex6.6.imp data8.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y.i x3.i x4.i d1.j n1.j
x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
ORDINAL: d1.j;
MISSING: 999;
FIXED: d1.j;
CENTER:
groupmean = x1.i;
grandmean = x2.i x7.j d1.j;
MODEL: y.i ~ x1.i x2.i x7.j d1.j | x1.i;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
OPTIONS: hev;
The output shows the pooled (mean) residual variance across all clusters, the
heterogeneity index (theta) described in Kasim & Raudenbush (1998), and the ratio of
the largest to smallest group-specific variance, as shown below. All other output is the
same. Note that Kasim & Raudenbush (1998) characterize heterogeneity indices equal
---------------------------------------------
Variances:
L2 : Var(Intercept) 0.648 0.088 0.507 0.851
L2 : Cov(x1.i,Intercept) 0.026 0.014 -0.000 0.056
L2 : Var(x1.i) 0.014 0.006 0.006 0.028
Mean Residual Var. 0.247 0.017 0.215 0.283
Heterogeneity Index 0.207 0.036 0.149 0.288
Largest-Smallest Ratio 24.842 10.660 14.293 51.465
...
---------------------------------------------
This example illustrates a two-level path model where the random intercepts and
random slopes from one model predict a level-2 outcome in another model. The
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y1.i x3.i x4.i d1.j n1.j
x5.j x6.j x7.j x8.j y2.j;
CLUSTERID: level2id;
ORDINAL: d1.j;
MISSING: 999;
FIXED: d1.j;
RANDOMEFFECT:
ranicepts = y1.i | 1 [level2id];
ranslopes = y1.i | x1.i [level2id];
CENTER:
groupmean = x1.i;
grandmean = x2.i x7.j d1.j;
MODEL:
y1.i ~ x1.i x2.i x7.j d1.j | x1.i;
y2.j ~ ranicepts ranslopes x7.j;
Blimp User’s Guide (Version 3) 190
SEED: 90291;
BURN: 25000;
ITERATIONS: 25000;
A slightly different way to set up the model is to define the random intercepts and
slopes as level-2 latent variables that predict the outcomes from both models. The
random effect parameter estimates no longer appear in the same table as the other
model parameters because they are distinct latent variables with their own mean
(fixed effect) and variance (random effect). This parameterization converges more
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y1.i x3.i x4.i d1.j n1.j
x5.j x6.j x7.j x8.j y2.j;
CLUSTERID: level2id;
ORDINAL: d1.j;
MISSING: 999;
LATENT:
level2id = ranicept;
Blimp User’s Guide (Version 3) 191
level2id = ranslope;
FIXED: d1.j;
CENTER:
groupmean = x1.i;
grandmean = x2.i x7.j d1.j;
MODEL:
ranicept ~ 1;
ranslope ~ 1;
ranicept ~~ ranslope;
y1.i ~ 1@0 ranicept@1 x1.i*ranslope@1 x2.i x7.j d1.j | 1@0;
y2.j ~ ranicept ranslope x7.j;
SEED: 90291;
BURN: 25000;
ITERATIONS: 25000;
This example illustrates a two-level regression model that includes within- and
between-cluster slopes for a level-1 predictor and a latent contextual effect (Lüdtke et
al., 2008).
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex6.8.imp data8.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
Blimp User’s Guide (Version 3) 192
DATA: data8.dat;
VARIABLES: level1id level2id x1.i x2.i y x3.i x4.i
d1.j n1.j x5.j x6.j x7.j x8.j x9.j;
CLUSTERID: level2id;
ORDINAL: d1.j;
MISSING: 999;
FIXED: d1.j;
CENTER:
groupmean = x2.i;
grandmean = x2.i.mean x7.j d1.j;
MODEL:
y ~ x2.i@betaw x2.i.mean@betab x7.j d1.j | x2.i;
PARAMETERS:
x2.contextual = betab - betaw;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 193
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex6.9.imp data1.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
DATA: data1.dat;
VARIABLES: level1id level2id d1.i o1.i y.i x1.i d2.i x2.j x3.j;
Blimp User’s Guide (Version 3) 194
CLUSTERID: level2id;
MISSING: 999;
CENTER:
groupmean = x1.i;
grandmean = x2.j;
MODEL:
y.i ~ x1.i x2.j x1.i*x2.j | x1.i;
SIMPLE:
x1.i | x2.j;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
This example illustrates a two-level path model that features an indirect effect of two
level-1 predictors, both of which are within-cluster centered at their latent group
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex6.10.imp data1.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
Blimp User’s Guide (Version 3) 195
DATA: data1.dat;
VARIABLES: level1id level2id d1.i y.i m.i x1.i d2.i x2.j x3.j;
CLUSTERID: level2id;
MISSING: 999;
CENTER:
groupmean = x1.i m.i;
MODEL:
m.i ~ x1.i@alpha1 | x1.i;
y.i ~ m.i@beta1 x1.i | m.i x1.i;
PARAMETERS:
indirect = alpha1 * beta1;
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
This example illustrates a two-level path model that features an indirect effect of two
level-1 predictors, both of which are within-cluster centered at their latent group
means, and a level-2 predictor moderating the direct pathways. The regression models
are as follows.
Blimp User’s Guide (Version 3) 196
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex6.11.imp data1.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
DATA: data1.dat;
VARIABLES: level1id level2id d1.i y.i m.i x1.i d2.i x2.j x3.j;
CLUSTERID: level2id;
MISSING: 999;
CENTER:
Blimp User’s Guide (Version 3) 197
This example illustrates a two-level path model that features a within-cluster indirect
involving a pair of latent group means. The regression models are as follows.
The ellipses in the between-cluster model represent latent group means (i.e., random
intercepts). Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
Ex6.12.imp data1.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
❖ MODEL command specifies latent group means as a level-2 predictor with the
.mean suffix on a level-1 predictor
❖ MODEL command labels within- and between-cluster slopes
❖ PARAMETERS command uses labeled quantities to compute the product of
coefficients estimator
❖ Unspecified associations for predictor variables
DATA: data1.dat;
VARIABLES: level1id level2id d1.i y.i m.i x1.i d2.i x2.j x3.j;
CLUSTERID: level2id;
MISSING: 999;
CENTER:
grandmean = x1.i.mean m.i.mean;
groupmean = x1.i m.i;
MODEL:
m.i ~ x1.i@alpha1 x1.i.mean@alpha2 | x1.i;
y.i ~ m.i@beta1 m.i.mean@beta2 x1.i x1.i.mean | m.i;
PARAMETERS:
indirect.w = alpha1 * beta1;
indirect.b = alpha2 * beta2;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
This feature is currently available, and the User Guide will be updated soon with an
This example illustrates a two-level linear growth model that includes a cross-level
binary moderator. The regression model, which is the two-level version of the latent
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
Ex6.14.imp data9.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
DATA: data9.dat;
VARIABLES: level2id y.i time.i n1.i v1.i v2.i d1.j d2.j;
Blimp User’s Guide (Version 3) 201
CLUSTERID: level2id;
NOMINAL: d1.j;
MISSING: 999;
FIXED: time.i d1.j;
MODEL: y.i ~ time.i d1.j time.i*d1.j | time.i;
SIMPLE: time.i | d1.j;
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
This example illustrates a three-level linear growth model that includes a cross-level
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface..
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
DATA: data10.dat;
VARIABLES: level1id level2id level3id y.i time.i x1.i
n1.j d1.j d2.j n2.j x2.j d3.k x3.k x4.k;
NOMINAL: d3.k;
CLUSTERID: level2id level3id;
MISSING: 999;
FIXED: time.i d3.k;
CENTER: grandmean = x2.j;
MODEL:
y.i ~ time.i x2.j d3.k time.i*d3.k | time.i;
SIMPLE:
time.i | d3.k;
SEED: 90291;
BURN: 15000;
ITERATIONS: 10000;
By default, Blimp estimates random intercepts and random slopes (when specified) at
all levels of the data hierarchy. For example, the previous analysis produces a 2 x 2
covariance matrix of random effects at level-2 and level-3. In some situations, it may
be desirable or necessary to override Blimp’s default behavior and fix certain variance
components to zero (or alternatively, select which variances get estimated). This is
achieved by listing the desired random effects on the right side of the vertical pipe and
Blimp User’s Guide (Version 3) 203
illustrate, the following code block illustrates a three-level model with random
intercepts at both levels and a random coefficient for the temporal predictor at the
DATA: data10.dat;
VARIABLES: level1id level2id level3id y.i time.i x1.i
n1.j d1.j d2.j n2.j x2.j d3.k x3.k x4.k;
NOMINAL: d3.k;
CLUSTERID: level2id level3id;
MISSING: 999;
FIXED: time.i d3.k;
CENTER: grandmean = x2.j;
MODEL:
y.i ~ time.i x2.j d3.k time.i*d3.k |
1[level2id] 1[level3id] time.i[level2id];
SEED: 90291;
BURN: 15000;
ITERATIONS: 10000;
measurement model for the within-cluster scores at level-1 and the between-cluster
latent group means at level-2. The model also features predictor variables at each
The ellipses in the between-cluster model are latent group means (i.e., random
intercepts that load on a level-2 latent variable. Clicking the links below downloads
the Blimp scripts and data for this example, and the full set of User Guide examples is
Ex6.16.imp data11.dat
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
❖ MODEL command fixes the within- and between-cluster loading of the first
indicator to 1
❖ Default specification fixes the latent means equal to 0
❖ Longer burn-in period for estimating latent variables
DATA: data11.dat;
VARIABLES: level2id y1.i:y4.i x1.i x2.i x3.j;
MISSING: 999;
CLUSTERID: level2id;
LATENT:
latenty.l1;
level2id = latenty.l2;
CENTER:
grandmean = x1.i x2.i x3.j;
MODEL:
# structural model
latenty.l1 ~ x1.i x2.i;
latenty.l2 ~ x3.j;
# measurement model
y1.i ~ latenty.l1@1 latenty.l2@1;
y2.i ~ latenty.l1 latenty.l2;
y3.i ~ latenty.l1 latenty.l2;
y4.i ~ latenty.l1 latenty.l2;
SEED: 90291;
BURN: 10000;
ITERATIONS: 10000;
This feature is currently under development and will be added in a future update.
Blimp User’s Guide (Version 3) 206
This example illustrates a two-level regression model from a partially nested design.
The example below considers a level-2 binary predictor (e.g., a treatment assignment
in level-2 units but observations in group D = 0 are not nested (i.e., are singleton
clusters). The regression model below features an interaction between the binary
indicator and the random intercept, such that the random effect term drops from the
equation if D = 0.
More generally, the variable D does not need to have a fixed effect. Outside of an
an observation that shares cluster membership with other observations). The following
Blimp estimates this model by defining a level-2 latent variable that interacts with D.
Clicking the links below downloads the Blimp scripts and data for this example, and
the full set of User Guide examples is available from a pull-down menu in the
graphical interface.
Ex6.18.imp data19.dat
DATA: data19.dat;
VARIABLES: level2id y.i d.j;
MISSING: 999;
CLUSTERID: level2id;
LATENT: level2id = ranicepts;
MODEL:
ranicepts ~ 1@0;
y.i ~ d.j ranicepts*d.j@1 | 1@0;
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
modeling features. Clicking the links below downloads the Blimp scripts and data for
this example, and the full set of User Guide examples is available from a pull-down
The input data set is in stacked (i.e., “person-period”) format with each row
representing a time interval nested within an individual. The data also include a set of
time indicators that dummy code each measurement interval. The example below
illustrates a model with six intervals and thus six dummy codes. The outcome variable
Blimp User’s Guide (Version 3) 208
is an event indicator that equals 0 if the event did not happen in the interval and a 1 if
the event did happen in the interval. Figure 11.5 from Singer and Willett (2003)
The basic model is a logistic regression with the binary event indicator regressed on
Note that the model omits the usual regression intercept. The syntax highlights are
shown below. Adding the NIMPS and SAVE commands generates imputed data sets,
and adding the savepredicted keyword to the OPTIONS command saves predicted
DATA: data20.dat;
VARIABLES: level2id time.i t1.i t2.i t3.i t4.i t5.i t6.i
y.i d.j x.j;
ORDINAL: y.i t1.i t2.i t3.i t4.i t5.i t6.i;
CLUSTERID: level2id;
MISSING: 999;
FIXED: t1.i t2.i t3.i t4.i t5.i t6.i;
MODEL:
logit(y.i) ~ 1@0 t1.i@alpha1 t2.i@alpha2 t3.i@alpha3
t4.i@alpha4 t5.i@alpha5 t6.i@alpha6 | 1@0;
PARAMETERS:
hazard.1 = exp(alpha1) / (1 + exp(alpha1));
hazard.2 = exp(alpha2) / (1 + exp(alpha2));
hazard.3 = exp(alpha3) / (1 + exp(alpha3));
hazard.4 = exp(alpha4) / (1 + exp(alpha4));
hazard.5 = exp(alpha5) / (1 + exp(alpha5));
hazard.6 = exp(alpha6) / (1 + exp(alpha6));
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
The next example expands the model by incorporating a person-level dummy code
As before, the model omits the usual regression intercept and includes a set of six
dummy codes that index the intervals. The code block below is identical to the
previous example, but it defines the binary predictor as ordinal and grand mean
DATA: data20.dat;
VARIABLES: level2id time.i t1.i t2.i t3.i t4.i t5.i t6.i
y.i d.j x.j;
ORDINAL: y.i t1.i t2.i t3.i t4.i t5.i t6.i d.j;
CLUSTERID: level2id;
MISSING: 999;
FIXED: t1.i t2.i t3.i t4.i t5.i t6.i d.j x.j;
CENTER: grandmean = x.j;
MODEL:
logit(y.i) ~ 1@0 t1.i@alpha1 t2.i@alpha2 t3.i@alpha3
t4.i@alpha4 t5.i@alpha5 t6.i@alpha6 d.j x.j | 1@0;
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 211
This section illustrates missing not at random analysis models in Blimp. Following the
previous chapters, the examples in this section use a generic notation system where
variable names usually consist of an alphanumeric prefix and a numeric suffix (e.g., Y1,
X1, N1, D1, D2, V1, V2, V3). The letter Y designates a dependent variable, a D prefix
continuous variables. For examples involving multilevel models, the examples use a
“.i” suffix to denote level-1 variables and “.j” for level-2 variables (e.g., d1.j is a
determines the levels automatically, so the suffixes are meant as a visual aid for
understanding the scripts. Finally, the model equations use “cgm” and “cwc”
superscripts to indicate grand and group mean centering, respectively. The following
This example illustrates a selection model for a missing not at random process where
an incomplete outcome variable predicts its own missingness. The focal analysis model
The most basic selection model is one where the outcome alone predicts its
missingness indicator (MY = 0 if Y is observed and 1 if it’s missing); Gomer and Yuan
(2021) refer to this as a focused missing not at random process. The following
equation is a probit model where the missingness indicator’s latent response variable
For identification, the residual variance is fixed at 1, and the threshold parameter is
The analysis model also incorporates three auxiliary variables using the sequential
specification from Example 4.7. Clicking the links below downloads the Blimp scripts
Blimp User’s Guide (Version 3) 213
and data for this example, and the full set of User Guide examples is available from a
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
DATA: data3.dat;
VARIABLES: id x1 x2 x3 y d1 d2 v1:v4;
MISSING: 999;
ORDINAL: d1 d2;
FIXED: d1 d2;
CENTER: x1;
MODEL:
# focal analysis model
y ~ d1 d2 x1;
# auxiliary variable models
x2 x3 ~ y d1 d2 x1;
# selection model
y.missing ~ y;
Blimp User’s Guide (Version 3) 214
SEED: 90291;
BURN: 1000;
ITERATIONS: 10000;
A more complex selection model features the outcome predicting its missingness
indicator along with other variables, in this case D1; Gomer and Yuan (2021) refer to
this as a diffuse missing not at random process. The following equation is a probit
model where the missingness indicator’s latent response variable is regressed the
Caution is warranted when including too many predictors from the analysis model in
predictors in a stepwise fashion using fit indices such as the DIC and WAIC is often a
good strategy. The code block for the analysis is shown below.
DATA: data3.dat;
VARIABLES: id x1 x2 x3 y d1 d2 v1:v4;
Blimp User’s Guide (Version 3) 215
MISSING: 999;
ORDINAL: d1 d2;
FIXED: d1 d2;
CENTER: x1;
MODEL:
# focal analysis model
y ~ d1 d2 x1;
# auxiliary variable models
x2 x3 ~ y d1 d2 x1;
# selection model
y.missing ~ y d1;
SEED: 90291;
BURN: 2500;
ITERATIONS: 10000;
This example illustrates a pattern mixture model for a missing not at random process
where regression model parameters differ between cases with and without dependent
variable scores. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
assignment indicator), and D2 and X1 are covariates. The most basic pattern mixture
model is one where the intercept (outcome variable mean) differs between people with
and without Y values; Gomer and Yuan (2021) characterize this as a focused missing
Blimp User’s Guide (Version 3) 216
not at random process. The fitted model features a binary missing data indicator (MY =
pattern-specific intercepts, where the weights are the group proportions. The marginal
where p(obs) and p(mis) are the proportions of completers and dropouts, respectively.
must be fixed to a value during estimation, and the magnitude and sign of the
coefficient controls the strength and direction of the missing not at random process.
Enders (2022, Section 9.7) illustrates a strategy that uses off-the-shelf effect size
benchmarks to determine this parameter. For example, if a researcher felt that the
Blimp User’s Guide (Version 3) 217
unseen Y scores have a higher mean than the observed data, then the inestimable
effect size and the dependent variable’s standard deviation (or residual standard
deviation).
A positive value of d sets the mean of the unseen scores to a higher value than the
observed data, and a negative value specifies a lower mean. The code block below
sets the effect size equal to +0.20 and uses the residual standard deviation to estimate
the spread of Y (Little, 2009, p. 428). This setting corresponds to a sensitivity analysis
where persons with incomplete data are hypothesized to have a mean difference
roughly equal to Cohen’s (1988) small effect size benchmark. The syntax highlights are
shown below, and adding the NIMPS and SAVE commands generates model-based
multiple imputations for a frequentist analysis that no longer requires the missing data
indicator.
❖ PARAMETERS command passes the value of the residual standard deviation into
the formula that determines the intercept mean difference
❖ PARAMETERS command uses labeled quantities to compute missing data group
proportions, pattern-specific intercept coefficients, and a marginal intercept
estimate that averages over the missing data patterns
DATA: data3.dat;
VARIABLES: id x1 x2 x3 y d1 d2 v1:v4;
MISSING: 999;
ORDINAL: d1 d2 m.y;
CENTER: x1 d2;
TRANSFORM:
m.y = ismissing(y);
MODEL:
# sequential specification for predictors
m.y ~ 1@ymissmean;
x1 d1 d2 ~ m.y;
# focal analysis model
y ~ 1@b0obs m.y@b0diff d1 d2 x1 ;
# label residual variance
y ~~ y@resvar;
# auxiliary variable models
x2 x3 ~ y d1 d2 x1 ;
PARAMETERS:
# set b0diff equal to +.20 residual std. dev. units
cohensd = .20;
b0diff = cohensd * sqrt(resvar);
# missingness group proportions
p.obs = 1 - phi(-ymissmean);
p.mis = phi(-ymissmean);
# compute weighted average intercept
b0.obs = b0obs;
b0.mis = b0obs + b0diff;
b0 = (b0.obs * p.obs) + (b0.mis * p.mis);
Blimp User’s Guide (Version 3) 219
SEED: 90291;
BURN: 2000;
ITERATIONS: 10000;
A more complex pattern mixture model is one where people with missing outcome
scores have different intercepts and slopes than people with data; Gomer and Yuan
(2021) characterize this as a diffuse missing not at random process. The fitted model
features the missing data indicator and its interaction with the focal predictor, D1.
The marginal slope that averages over the missing data patterns is a weighted average
Importantly, both the intercept and slope difference for the incomplete cases (the
dashed lines pointing from MY to Y and MY to D1’s slope) are inestimable because
Blimp User’s Guide (Version 3) 220
people in the MY = 1 group have no data on Y. As such, these parameters must be fixed
to a value during estimation, and their magnitude and sign control the strength and
direction of the missing not at random process. The same effect size-based strategy
can be applied to the slope difference. In this example, the focal predictor D1 is binary
(e.g, intervention vs. control), in which case β1(obs) is the group mean difference for
people with data on Y, and β1(diff) is the additional group mean difference for persons
in which case d can be viewed as the additional change in the dependent variable (in
standard deviation units) for every one standard deviation increase in the predictor.
Setting d to a positive value means that the missing data group’s slope is more
To illustrate, suppose that Y is scaled such that high scores reflect a negative outcome
assignment dummy code (D1 = 0 indicates the control group, and D1 = 1 is the
intervention group). Further, consider a missing not at random process where control
group participants with the highest Y scores (e.g., most acute symptoms) leave the
study to seek treatment elsewhere, whereas intervention group participants with the
lowest Y scores (e.g., mildest symptoms) leave the study because they no longer feel
treatment is necessary. This scenario requires a positive value of d for the inestimable
Blimp User’s Guide (Version 3) 221
intercept difference and a negative value of d for the slope difference. The code block
below sets both effect sizes equal to 0.20 (they need not be the same) and uses the
DATA: data3.dat;
VARIABLES: id x1 x2 x3 y d1 d2 v1:v4;
MISSING: 999;
ORDINAL: d1 d2;
TRANSFORM:
m.y = ismissing(y);
CENTER: x1 d2;
MODEL:
# sequential specification for predictors
m.y ~ 1@ymissmean;
x1 d1 d2 ~ m.y;
# focal analysis model
y ~ 1@b0obs m.y@b0diff d1@b1obs d1*m.y@b1diff d2 x1;
# label residual variance
y ~~ y@resvar;
# auxiliary variable models
x2 x3 ~ y x1 d1 d2;
PARAMETERS:
# set b0diff equal to +.20 residual std. dev. units
# set b1diff equal to -.20 residual std. dev. units
cohensd = .20;
b0diff = cohensd * sqrt(resvar);
b1diff = - cohensd * sqrt(resvar);
# missingness group proportions
p.obs = 1 - phi(-ymissmean);
p.mis = phi(-ymissmean);
# compute weighted average intercept and slope
b0.obs = b0obs;
b0.mis = b0obs + b0diff;
b1.obs = b1obs;
b1.mis = b1obs + b1diff;
b0 = (b0.obs * p.obs) + (b0.mis * p.mis);
Blimp User’s Guide (Version 3) 222
practical heuristic for specifying inestimable coefficients, but it is still incumbent on the
researcher to choose values that are reasonable for a given application. As mentioned
previously, the magnitude of the missing data group’s difference parameters dictates
the strength of the missing not at random process. It is incorrect to view “small” values
salient in many situations. For example, consider a randomized intervention where the
true effect size is d = 0.20 (i.e., a small effect size). Setting the missing data group’s
coefficient difference to d = .20 means that the moderating impact of missing data is
just as large as the intervention effect itself. A medium effect size threshold is probably
an upper bound for most practical applications, and much smaller values of d could be
realistic. A sensitivity analysis strategy would examine the changes in the focal model
This feature is currently available, and the User Guide will be updated soon with an
missingness. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
Ex7.4.imp data15.dat
Centering the time scores at their grand mean defines β0 as an outcome mean that
by providing a mechanism to center the dependent variable in the selection model. The
factored regression specification also requires a model linking Y and YLAG, and a
convenient choice is the regression of YLAG on Y, with the slope coefficient fixed to 1,
as follows.
This parameterization defines α0 as the mean difference between Y and YLAG. Finally,
the selection part of the model regresses the binary dropout indicator (i.e., a
discrete-time survival indicator) on the outcome, lagged outcome, and a set of dummy
parameters from the previous equations to center Y and YLAG at their model-predicted
probabilities, which is consistent with the latent curve version of the model. Because
probability of missingness at the initial measurement. The syntax highlights are shown
below, and adding the NIMPS and SAVE commands generates model-based multiple
DATA: data15.dat;
VARIABLES: y dropout time level2id;
ORDINAL: dropout;
NOMINAL: timenom;
CLUSTERID: level2id;
MISSING: 999;
TRANSFORM:
timenom = time;
ylag = lag1(y, time, level2id);
Blimp User’s Guide (Version 3) 225
This feature is currently available, and the User Guide will be updated soon with an
growth model where random intercepts and slopes predict missingness. The shared
parameter model is designed for a missing not at random process where the
progression of the outcome scores over time rather than occasion-specific realizations
rapid decline are more likely to quit the study). Clicking the links below downloads the
Blimp scripts and data for this example, and the full set of User Guide examples is
The selection part of the model regresses the binary dropout (i.e., discrete-time
survival) indicator on the random intercepts and slopes as well as a set of dummy
follows.
measurement. The syntax highlights are shown below, and adding the NIMPS and
❖ MODEL command estimates the mean and variance of each between-cluster latent
variable (the latter specification happens by default)
❖ MODEL command specifies correlation between random intercepts and random
slopes (level-2 latent variables)
❖ MODEL command fixes the coefficient for the random intercept latent variable to 1
❖ MODEL command fixes the interaction coefficient for the product of the random
slope predictor and the level-2 latent slope variable to 1
❖ PARAMETERS command specifies a constraint on a coefficient
DATA: data17.dat;
VARIABLES: level2id time.i y.i dropout.i;
ORDINAL: dropout.i;
NOMINAL: timenom.i;
CLUSTERID: level2id;
MISSING: 999;
TRANSFORM:
timenom.i = time.i;
LATENT:
level2id = icept;
level2id = slope;
FIXED: time.i timenom.i;
MODEL:
icept ~ 1;
slope ~ 1;
icept ~~ slope;
y.i ~ 1@0 icept@1 time.i*slope@1 | 1@0;
dropout.i ~ 1@gamma0 timenom.i icept slope | 1@0;
PARAMETERS:
gamma0 = -3;
SEED: 90291;
BURN: 30000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 228
When implementing the previous specification, the random effect parameter estimates
do not appear on the same table as the other growth model parameters because they
are distinct latent variables with their own mean (fixed effect) and variance (random
effect). A slightly different way to set up the model is to define the level-2 residuals as
This parameterization converges more quickly in this example. The new syntax
DATA: data17.dat;
VARIABLES: level2id time.i y.i dropout.i;
ORDINAL: dropout.i;
NOMINAL: timenom.i;
CLUSTERID: level2id;
MISSING: 999;
TRANSFORM:
timenom.i = time.i;
RANDOMEFFECT:
icept = y.i | 1 [level2id];
slope = y.i | time.i [level2id];
FIXED: time.i timenom.i;
MODEL:
y.i ~ time.i | time.i;
dropout.i ~ 1@gamma0 timenom.i icept slope | 1@0;
Blimp User’s Guide (Version 3) 229
PARAMETERS:
gamma0 = -3;
SEED: 90291;
BURN: 20000;
ITERATIONS: 10000;
This example illustrates the random coefficient pattern mixture model from Hedeker
and Gibbons (1997). The model is designed for a missing not at random process where
growth model parameters differ between cases who complete the study versus those
who dropout. Clicking the links below downloads the Blimp scripts and data for this
example, and the full set of User Guide examples is available from a pull-down menu
Ex7.7.imp data18.dat
effect involving a level-2 dummy code D1 (e.g., a treatment assignment indicator) and
completers and dropouts, M = 0 and 1, respectively. The fitted model features the
where the “obs” subscript denotes the completer group’s (M = 0) parameters, and the
“diff” subscript denotes coefficient differences for the dropout group (M = 1). Following
Example 7.2, the overall population-level estimates (i.e., the marginal estimates that
pattern-specific coefficients, where the weights are the group proportions. The
marginal estimates for this example are shown below, where p(obs) and p(mis) are the
The syntax highlights are shown below, and adding the NIMPS and SAVE commands
6.3).
DATA: data18.dat;
VARIABLES: level2id n1.j d1.j y.i time.i n2.j m.j d3.i;
ORDINAL: d1.j m.j;
CLUSTERID: level2id;
MISSING: 999;
FIXED: time.i;
MODEL:
m.j ~ 1@ymissmean;
d1.j ~ m.j;
y.i ~ 1@beta0.obs time.i@beta1.obs d1.j@beta2.obs
(time.i*d1.j)@beta3.obs m.j@beta0.dif (m.j*time.i)@beta1.dif
(m.j*d1.j)@beta2.dif (m.j*time.i*d1.j)@beta3.dif | time.i;
PARAMETERS:
# missingness group proportions
p.obs = 1 - phi(-ymissmean);
p.mis = phi(-ymissmean);
# population-average estimates
beta0 = p.obs * beta0.obs + p.mis * (beta0.obs + beta0.dif);
beta1 = p.obs * beta1.obs + p.mis * (beta1.obs + beta1.dif);
beta2 = p.obs * beta2.obs + p.mis * (beta2.obs + beta2.dif);
beta3 = p.obs * beta3.obs + p.mis * (beta3.obs + beta3.dif);
SEED: 90291;
BURN: 5000;
ITERATIONS: 10000;
Blimp User’s Guide (Version 3) 232
8 References
Alacam, E., Du, H., Enders, C. K., & Keller, B. T. (2021). A model-based approach to
669-679.
Arnold, B. C., Castillo, E., & Sarabia, J. M. (2001). Conditionally specified distributions:
Asparouhov, T., & Muthén, B. (2021). Expanding the Bayesian Structural Equation,
Asparouhov, T., & Muthén, B. (2021). Advances in Bayesian Model Fit Evaluation for
Barnard, J., McCulloch, R., & Meng, X.-L. (2000). Modeling covariance matrices in terms
Bartlett, J. W., Seaman, S. R., White, I. R., & Carpenter, J. R. (2015). Multiple imputation
doi:10.1177/0962280214521348
Blimp User’s Guide (Version 3) 233
Bates, D., Maechler, M., Bolker, B., Walker, S., Christensen, R. H. B., Singmann, H., . . .
Bauer, D.J. (2017). A more general Model for testing measurement invariance and
Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. West
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Eekhout, I., Enders, C. K., Twisk, J. W. R., de Boer, M. R., de Vet, H. C. W., & Heymans,
doi:10.1080/10705511.2014.937670
Enders, C. K. (2022). Applied Missing Data (2nd ed.). New York: Guilford Press.
Enders, C. K., Du, H., & Keller, B. T. (2020). A model-based imputation procedure for
doi:10.1037/met0000228
Enders, C. K., & Gottschall, A. C. (2011). Multiple Imputation Strategies for Multiple
Enders, C. K., & Keller, B. T. (2019). Blimp Technical Appendix: Centering Covariates in
www.appliedmissingdata.com/multilevel-imputation.html:
Enders, C. K., Keller, B. T., & Levy, R. (2018). A fully conditional specification approach
121-138. doi:10.1037/1082-989X.12.2.121
Erler, N. S., Rizopoulos, D., Jaddoe, V. W., Franco, O. H., & Lesaffre, E. M. (2019).
doi:10.1177/0962280217730851
Erler, N. S., Rizopoulos, D., Rosmalen, J., Jaddoe, V. W., Franco, O. H., & Lesaffre, E. M.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014).
Bayesian data analysis (3rd ed.). Boca Raton, FL: CRC Press.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple
Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models
doi:10.1177/1471082x0800900301
Blimp User’s Guide (Version 3) 235
Gomer, K., & Yuan, K.-H. (2021). Subtypes of the missing not at random missing data
Graham, J. W. (2009). Missing data analysis: making it work in the real world. Annual
doi:10.1146/annurev.psych.58.110405.085530
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are
Grund, S., Lüdke, O., & Robitzsch, A. (2016). Multiple imputation of missing covariate
Grund, S., Robitzsch, A., & Lüdke, O. (2021). Package 'mitml'. Retrieved from
cran.r-project.org/web/packages/mitml/
Hamaker, E. L., & Muthén, B. (2019). The fixed versus random effects debate and how
doi:10.1016/j.stamet.2006.03.002
64-78.
Ibrahim, J. G., Chen, M. H., & Lipsitz, S. R. (2002). Bayesian methods for generalized
doi:10.2307/3315865
Blimp User’s Guide (Version 3) 236
Ibrahim, J. G., Lipsitz, S. R., & Chen, M. H. (1999). Missing covariates in generalized
linear models when the missing data mechanism is non-ignorable. Journal of the
doi:10.1111/1467-9868.00170
Johnson, V. E., & Albert, J. H. (1999). Ordinal data modeling. New York: Springer.
https://CRAN.R-project.org/package=semTools.
10-37.
Keller, B. T. (2022). Model-based missing data handling for manifest and latent
variable interactions.
Kim, S., Belin, T. R., & Sugar, C. A. (2018). Multiple imputation with non-additively
Kim, S., Sugar, C. A., & Belin, T. R. (2015). Evaluating model-based imputation methods
Levy, R., & Enders, C. (2021). Full conditional distributions for Bayesian multilevel
publication, 1-25.
Lipsitz, S. R., & Ibrahim, J. G. (1996). A conditional model for incomplete covariates in
10.1093/biomet/83.4.916
Liu, J. C., Gelman, A., Hill, J., Su, Y. S., & Kropko, J. (2014). On the stationary distribution
Liu, H. Y., Zhang, Z. Y., & Grimm, K. J. (2016). Comparison of inverse Wishart and
Newsletter, 1(3), 5.
Lüdtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., & Muthén, B.
(2008). The Multilevel Latent Covariate Model: A New, More Reliable Approach
201-229. doi:10.1037/a0012869
Blimp User’s Guide (Version 3) 238
Lüdtke, O., Robitzsch, A., & West, S. G. (2020a). Analysis of interactions and nonlinear
361-381.
Lüdtke, O., Robitzsch, A., & West, S. G. (2020b). Regression models involving nonlinear
Lüdtke, O., Robitzsch, A., & West, S. G. (2019). Regression models involving nonlinear
Merkle, E. C., & Rosseel, Y. (2018). blavaan: Bayesian structural equation models via
doi:10.18637/jss.v085.i04
Muthén, B., Muthén, L., & Asparouhov, T. (2016). Regression and mediation analysis
Polson, N. G., Scott, J. G., & Windle, J. (2013). Bayesian inference for logistic models
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and
https://CRAN.R-project.org/package=lavaan.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Seaman, S. R., Bartlett, J. W., & White, I. R. (2012). Multiple imputation of missing
doi:10.1186/1471-2288-12-46
van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully
219-242. doi:10.1177/0962280206074463
van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006).
doi:10.1080/10629360600810434
doi:10.18637/jss.v045.i03
0049124117747303.
Yeo, I. K., & Johnson, R. A. (2000). A new family of power transformations to improve
Zhang, Q., & Wang, L. (2017). Moderation analysis with missing data in the predictors.