A Note On Entrywise Consistency For Mixed-Data Matrix Completion
A Note On Entrywise Consistency For Mixed-Data Matrix Completion
A Note On Entrywise Consistency For Mixed-Data Matrix Completion
Abstract
This note studies matrix completion for a partially observed n by p data matrix involving mixed
types of variables (e.g., continuous, binary, ordinal). A general family of non-linear factor models
is considered, under which the matrix completion problem becomes the estimation of an n by p
low-rank matrix M. For existing methods in the literature, estimation consistency is established by
√
showing kM̂ − M∗ kF / np, the scaled Frobenius norm of the difference between the estimated
and true M matrices, converges to zero in probability as n and p grow to infinity. However, this
notion of consistency does not guarantee the convergence of each individual entry and, thus, may
not be sufficient when specific data entries or the worst-case scenario is of interest. To address this
issue, we consider the notion of entrywise consistency based on kM̂ − M∗ kmax , the max norm
of the estimation error matrix. We propose refinement procedures that turn estimators, which are
consistent in the Frobenius norm sense, into entrywise estimators through a one-step refinement.
Tight probabilistic error bounds are derived for the proposed estimators. The proposed methods are
evaluated by simulation studies and real-data applications for collaborative filtering and large-scale
educational assessment.
Keywords: Matrix completion; generalized latent factor model; mixed data; entrywise consis-
tency; max norm
1. Introduction
Missing data are commonly encountered in machine learning, especially for large-scale data involv-
ing many observations and variables. Matrix completion concerns the prediction of missing entries
in a partially observed matrix, which has received wide applications, such as collaborative filtering
(Goldberg et al., 1992; Feuerverger et al., 2012), social network recovery (Jayasumana et al., 2019),
sensor localization (Biswas et al., 2006), and educational and psychological measurement (Bergner
et al., 2022; Chen et al., 2023).
Many matrix completion methods consider real-valued matrices (Candès and Recht, 2009; Candès
and Tao, 2010; Keshavan et al., 2010; Klopp, 2014; Koltchinskii et al., 2011; Negahban and Wain-
wright, 2012; Chen et al., 2020c; Xia and Yuan, 2021). Their theoretical guarantees are typically
established under a linear factor model (e.g. Bartholomew et al., 2008), which says the underlying
complete data matrix can be decomposed as the sum of a low-rank signal matrix M and a mean-
zero noise matrix. Under this statistical model, the matrix completion task becomes to estimate the
signal matrix M based on the observed data entries. However, many real applications of matrix
completion involve mixed types of variables (e.g., continuous, count, binary, ordinal), for which
the linear factor model may not be suitable. For example, in survey studies, different questionnaire
items may be of different measurement scales – some items may be binary (e.g., yes/no), some may
be ordinal (e.g., disagree/neutral/agree), while others may be count variables (e.g., the number of
times that one skipped school). Mixed data also appear in multimodal biomedical data, where dif-
ferent types of variables are collected with different technologies (e.g., gene expression, genotype,
protein activity). Methods have been developed for matrix completion with specific variable types,
such as binary (Cai and Zhou, 2013; Davenport et al., 2014; Han et al., 2020, 2023), categorical
(Bhaskar, 2016; Klopp et al., 2015), count (Cao and Xie, 2015; McRae and Davenport, 2021; Robin
et al., 2019), and mixed data (Robin et al., 2020). Non-linear factor models, which are extensions
of the linear factor model, are typically assumed in these works.
A matrix completion Pmethod is typically evaluated by a mean squared error (MSE), defined as
n Pp
kM̂ − M kF /(np) = i=1 j=1 (m̂ij − m∗ij )2 /(np), where k·kF denotes the matrix Frobenius
∗ 2
norm, n×p is the size of the data matrix, and M̂ = (m̂ij )n×p and M∗ = (m∗ij )n×p are the estimated
and true signal matrices, respectively. Probabilistic error bounds have been established for the MSE
in the literature (see Chen et al., 2020c; Chen and Li, 2022; Cai and Zhou, 2016, and references
therein). Under suitable conditions, these error bounds imply that the MSE decays to zero when
both n and p grow to infinity, which is viewed as a notion of statistical consistency for matrix
completion. However, this notion of consistency slightly differs from that in our traditional sense;
that is, the MSE converging to zero does not imply the convergence of each individual entry, which,
however, may be important in some applications which concern the prediction of individual data
entries. Entrywise results for matrix completion have been established under linear factor models
(Abbe et al., 2020; Chen et al., 2019b, 2020c; Chernozhukov et al., 2023). However, such results
are not available for non-linear factor models, and extending these entrywise results to non-linear
factor models is non-trivial.
This note considers a general matrix completion problem that allows the variables to be of
mixed types. The generalized latent factor model (GLFM; Bartholomew et al., 2008; Skrondal
and Rabe-Hesketh, 2004) is a general family of latent variable models that combine factor analysis
with generalized linear modelling. By allowing for variable-specific link functions, the GLFM is
suitable for modelling multivariate data with mixed types. Under the GLFM framework, we propose
two methods that ensure entrywise consistency under dense and sparse missingness settings. Both
methods apply to an initial estimate whose MSE converges to zero. They obtain refined estimates by
solving some estimating equations constructed based on the initial estimate. The difference between
the two methods is that one involves data splitting while the other does not. The two methods have
the same asymptotic behavior under a dense setting where the proportion of observed entries does
not decay to zero. In that case, their entrywise error rate matches the MSE of the initial estimate
up to a logarithm factor, suggesting that there is virtually no loss when performing refinement.
However, under a sparse setting where the proportion of observed entries converges to zero, the
procedure with data splitting achieves a smaller error rate than the one without data splitting, and
the error rate of the data splitting procedure matches the MSE of the initial estimate up to a logarithm
factor. To our best knowledge, the current work is the first one obtaining an entrywise consistent
estimator for counts and binary data, assuming that the counts and binary data follow the Poisson
factor and the multidimensional two-parameter logistic model, respectively. Moreover, it is also
2
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
the first one for the more general GLFM model for mixed data. Our theoretical analysis further
shows that the refined estimator based on a constrained joint maximum likelihood estimator (Chen
et al., 2020a) for the GLFM is minimax optimal in an entrywise sense under a suitable asymptotic
regime. The proposed methods are evaluated by simulation studies and real-data applications for
collaborative filtering and large-scale educational assessment.
The rest of the note is organized as follows. In Section 2, we introduce a generalized latent factor
model for matrix completion with mixed data. Section 3 introduces two methods for achieving en-
trywise consistency. Theoretical guarantees on the proposed methods are established in Section 4. A
simulation study is given in Section 5, and two real data examples are given in Section 6. Finally, we
conclude with some discussions in Section 7. Additional simulation results and theoretical results,
and proofs of the theorems are given in the appendix. The computation code used in Sections 5 and 6
can be found at https://github.com/yunxiaochen/MatrixCompletion_MixedData.
Assumption 1. The missing indicators, ωij , i ∈ [n], j ∈ [p], are jointly independent. In addition,
Ω and Y are independent.
3
C HEN AND L I
Example 1. For a continuous variable j, we may assume fj to be a normal density function, where
φj is the variance, bj (mij ) = m2ij /2 and cj (yij , φj ) = −yij
2 /(2φ ) − (log(2πφ ))/2. When all the
j j
variables follow this normal model, the data matrix follows a linear factor model.
Example 2. Consider a binary or ordinal variable j such that Yij in {0, 1, ..., kj } for some given
kj ≥ 1, where kj = 1 and kj > 1 correspond to binary and ordinal variables, respectively. We can
assume fj to follow a Binomial logistic model, for which φj = 1, bj (mij ) = kj log(1 + exp(mij ))
and cj (yij , φj ) = log(kj ! ) − log(yij ! ) − log((kj − yij )! ). This model has been considered in
Masters and Wright (1984) with psychometric applications. When all the variables are binary
and follow this logistic model, the data matrix is said to follow a multidimensional two-parameter
logistic (M2PL) item response theory model (Reckase, 2009). This model has been considered in
Davenport et al. (2014) and Cai and Zhou (2013) for the completion of binary matrices.
Example 3. A Poisson model may be assumed for count variables j, for which φj = 1, bj (mij ) =
exp(mij ) and cj (yij , φj ) = − log(yij ! ). When all the variables follow this Poisson model, the joint
model for the data matrix is known as a Poisson factor model (Wedel et al., 2003). This Poisson
model has been considered in Robin et al. (2019) and Robin et al. (2020) for count data with missing
values.
Under the GLFM, EY = (b0j (mij ))n×p , where b0j (·) denotes the derivative of the known func-
tion bj (·). Thus, matrix completion under the GLFM again boils down to estimating the signal
matrix M = ΘAT . This estimation problem will be investigated in the rest. We note that a similar
GLFM framework has been considered in Robin et al. (2020) for analyzing mixed data with missing
values. However, they focused on evaluating the estimation accuracy by the MSE, while our main
focus is the entrywise loss.
4
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Step 4. For each j ∈ [p], obtain ãj by solving the following equation:
n
X
ωij {yij − b0j ((ãj )T θ̃i )}θ̃i = 0r . (2)
i=1
Output: M̃ = Θ̃(Ã)T , where Θ̃ = (θ̃1 , · · · , θ̃n )T ∈ Rn×r and à = (ã1 , · · · , ãp )T ∈ Rp×r
are obtained from Steps 3 and 4, respectively.
We comment on the implementation. First, the constant C2 depends on the true signal matrix
M∗ .Recall that we assume M∗ to be of rank r under the GLFM. Thus, M∗ can be decomposed
as M = U∗r D∗r (Vr∗ )T , where U∗r ∈ Rn×r and Vr∗ ∈ Rp×r are the left and right singular matrices
∗
5
C HEN AND L I
corresponding to the non-zero singular values, and D∗r ∈ Rr×r is a diagonal matrix whose diagonal
elements are the singular values σ1 (M∗ ) ≥ · · · ≥ σr (M∗ ) > 0. We require C2 to satisfy C2 ≥
kVr∗ k2→∞ . On the other hand, C2 should not be chosen too large. As will be shown in Section 4.2,
it is assumed that C2 has the same asymptotic order as kVr∗ k2→∞ ; otherwise, the error bound for
kM̃ − M∗ kmax needs additional modification. Second, we note that the projection in Step 2 is very
easy to perform. Let V = (v1 , ..., vp )T be a p × r matrix. Then proj{A∈Rp×r :kAk2→∞ ≤C2 } (V) =
(ṽ1 , ..., ṽp )T , where ṽi = vi if kvi k≤ C2 and ṽi = (C2 /kvi k)vi otherwise. Third, the algorithm
requires knowing the number of factors r. Under the GLFM and suitable conditions, this quantity
can be consistently selected based on information criteria (Chen and Li, 2022) or by identifying a
singular value gap using a SVD-based approach (Zhang et al., 2020). Finally, we provide a remark
on solving the equations in Steps 3 and 4.
Remark 1. In Steps 3 and 4, we propose to solve some estimating equations. As will be shown
in Section 4, these equations have a unique solution with probability converging to 1 under a
suitable asymptotic regime. ThesePsteps are equivalent to performing optimization to certain log-
likelihood functions. Let `(M) = i,j:ωij =1 {yij mij − bj (mij )} be a weighted log-likelihood func-
tion based on observed data (Y ◦ Ω, Ω), where the individual log-likelihood terms are weighted
by the dispersion parameters1 . Then, solving the estimating equations (1) is equivalent to solv-
ing Θ̃ ∈ arg maxΘ `(ΘÂT ), and solving the estimating equations (2) is equivalent to solving
à ∈ arg maxA `(Θ̃AT ). This is due to that the estimating equations (1) and (2) are obtained
by taking the partial derivatives of `(ΘAT )with respect to Θ and A, respectively, and that the
objective function `(ΘAT ) is convex with respect to Θ and A given the other.
We provide an informal theorem under a simplified setting to shed some light on the asymptotic
behavior of Algorithm 1. Its formal version is Theorem 5 in Section 4.2, which is established under
a more general setting. For the missing pattern Ω = (ωij )i∈[n],j∈[p] , let πij = P(ωij = 1) be the
sampling probabilities and πmin = mini∈[n],j∈[p] πij and πmax = maxi∈[n],j∈[p] πij be the minimal
and maximal sampling probabilities, respectively. The notation π for the sampling probabilities
should be distinguished from the Roman (upright font) notation π for the mathematical constant of
circumference ratio in Example 1.
Theorem 2 (An informal and simplified version of Theorem 5). Assume that limn,p→∞ P(kM̂ −
M∗ kF ≤ eM,F ) = 1 and let M̃ be obtained by Algorithm 1. Then, under suitable assumptions
on M∗ and the asymptotic regime πmin = πmax = π, r is fixed, pπ, nπ (log(np))3 , and
{(n ∧ p)π}−1/2 . (np)−1/2 eM,F π 1/2 (log(np))−2 , with probability tending to 1, we have
kM̃ − M∗ kmax . (log(np))2 π −1/2 (np)−1/2 eM,F .
We clarify that eM,F in the above theorem is a non-random number that depends on n and p. We
consider the asymptotic regime {(n ∧ p)π}−1/2 . (np)−1/2 eM,F above because {(n ∧ p)π}−1/2 is
the minimax error rate of (np)−1/2 kM̂ − M∗ kF ; see Chen and Li (2022).
Remark 3. We provide intuitions on the result of Theorem 2 under the linear factor model setting.
Using Wedin’s sine angle theorem (Wedin, 1972) and under suitable assumptions, one can show
that there exist Θ∗ = (θij
∗) ∗ ∗ ∗ ∗ ∗ T ∗
n×r and A = (aij )p×r , such that M = Θ (A ) , k − A kF .
1. The weighted likelihood is used so that the nuisance parameters φj do not involve in estimating M, which simpli-
fies the theoretical analysis. We believe that the current analysis can be extended to the unweighted log-likelihood
function for the joint estimation of M and dispersion parameters φj .
6
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
(np)−1/2 eM,F with probability tending to 1 as n and p grow to infinity, kA∗ k2→∞ . p−1/2 , and
kΘ∗ k2→∞ . p1/2 .
Then solving for θ̃i in Step 3 can be viewed as a linear regression problem with a small mea-
surement error in the covariates, where a∗j s are the true covariates and âj are the covariates with
measurement error. Under the linear factor model, bj (mij ) = m2ij /2 for all j. Thus, one can
write down the analytic form for θ̃i that solves Equation (1). From these analytic forms, one
can show that with probability tending to 1, kΘ̃ − Θ∗ k2→∞ . log(np)π −1/2 p1/2 kA∗ − ÂkF .
log(np)π −1/2 n−1/2 eM,F , which also implies that kΘ̃k2→∞ . p1/2 . Here, the log(np) term
comes from a tail bound of maxi=1,...,n,j=1,...,p |Yij − b0 (m∗ij )|. Similarly, one can obtain the an-
alytical expression for ãj that solves Equation (2), which now involves θ̃i − θi∗ , i = 1, ..., n.
From these expressions, one can show that kà − A∗ k2→∞ . log(np)p−1 kΘ̃ − Θ∗ k2→∞ .
(log(np))2 π −1/2 n−1/2 p−1 eM,F holds with probability tending to 1. Combining the above results,
it holds that, with probability tending to 1,
Theorem 4 (An informal and simplified version of Theorem 10). Assume that limn,p→∞ P(kM̂Nk · −
M∗Nk · kF ≤ eM,F ) = 1 for eM,F (k = 1, 2) and M̃ is obtained by Algorithm 2. Then, under suitable
assumptions on M∗ and the asymptotic regime πmin = πmax = π, r is fixed, pπ, nπ (log(np))3 ,
7
C HEN AND L I
Step 5. Swap N1 and N2 in Steps 1 – 4, and obtain Θ̃N1 and Ã(2) accordingly.
Output: M̃ = (m̃ij )i∈[n],j∈[p] , where (m̃ij )i∈N1 ,j∈[p] = Θ̃N1 (Ã(2) )T and
(m̃ij )i∈N2 ,j∈[p] = Θ̃N2 (Ã(1) )T .
and {(n ∧ p)π}−1/2 . (np)−1/2 eM,F (log(np))−2 , with probability tending to 1, we have
kM̃ − M∗ kmax . (log(np))2 (np)−1/2 eM,F .
As the data splitting in Algorithm 2 is random, it may be beneficial to run it multiple times
and then aggregate the resulting estimates. We describe this variation of Algorithm 2 below. For a
fixed number of random splittings, the asymptotic behavior of Algorithm 3 is the same as that of
Algorithm 2.
Our refinement methods require input from an F-consistent estimator. We give examples of F-
consistent estimators.
8
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
CJMLE. The constrained joint maximum likelihood estimator (CJMLE) solves the following op-
timization problem
(Θ̂, Â) ∈ arg max `(ΘAT ), s.t. Θ ∈ Rn×r , A ∈ Rp×r , kΘk2→∞ ≤ C, kAk2→∞ ≤ C. (3)
Θ,A
The estimate of M is then given by M̂ = Θ̂ÂT . The terminology “joint likelihood” comes from
the latent variable model literature (Chapter 6, Skrondal and Rabe-Hesketh, 2004). This literature
distinguishes the joint likelihood from the marginal likelihood, depending on whether entries of Θ
are treated as fixed parameters or random variables, where the marginal likelihood is more com-
monly adopted in the statistical inference of traditional latent variable models. This estimator was
first proposed in Chen et al. (2019a) and Chen et al. (2020b) for the estimation of high-dimensional
GLFMs, and an error bound on kM̂ − M∗ kF under a general matrix completion setting can be
found in Theorem 2 of Chen and Li (2022). The computation of (3) can be done by an alternating
maximization algorithm as given in Chen et al. (2020b). This algorithm is theoretically guaran-
teed to converge to a critical point and has good convergence performance according to numerical
experiments (Chen et al., 2020b), though (3) is a nonconvex optimization problem.
More specifically, suppose that the true signal matrix has a decomposition M∗ = Θ∗ (A∗ )T ,
such that kΘ∗ k2→∞ ≤ C and kA∗ k2→∞ ≤ C. Then, under a similar setting as in Theorems 2 and
√
4, we have limn,p→∞ P(kM̂ − M∗ kF / np ≤ κ† {(p ∧ n)π}−1/2 ) = 1, for some finite positive
constant κ† . As shown in Proposition 1 of Chen and Li (2022), {(p ∧ n)π}−1/2 is also the minimax
lower bound for estimating M in the scaled Frobenius norm, which is why this lower bound is
assumed for (np)−1/2 eM,F in Theorems 2 and 4.
NBE. The CJMLE requires solving a non-convex optimization problem for which convergence
to the global optimum is not always guaranteed. The nuclear-norm-constrained-based estimator
(NBE) is a convex approximation to CJMLE. It solves the following optimization problem
√
M̂ ∈ arg max `(M), s.t. kMkmax ≤ ρ0 , kMk∗ ≤ ρ0 rnp. (4)
M
√
The nuclear norm constraint is introduced, since {M ∈ Rn×p : kMkmax ≤ ρ0 , kMk∗ ≤ ρ0 rnp}
is a convex relaxation of {M ∈ Rn×p : kMkmax ≤ ρ0 , rank(M) ≤ r}. This estimator has been
considered in Davenport et al. (2014) for the completion of binary matrices. When the true model
follows the M2PL model and the true signal matrix M∗ satisfies kM∗ kmax ≤ ρ0 , then Theorem 1 of
Davenport et al. (2014) implies that under the same setting of Theorems 2 and 4, limn,p→∞ P(kM̂−
√
M∗ kF / np ≤ κ‡ {(p ∧ n)π}−1/4 ) = 1, where κ‡ is a finite positive constant which depends on the
true model parameters. We believe that the same rate holds for other GLFMs under the simplified
setting of Theorems 2 and 4.
Other estimators. Note that other F-consistent estimators may be available for GLFMs, such as
SVD-based methods (Chatterjee, 2015; Zhang et al., 2020), nuclear-norm-regularized estimators
(Klopp, 2014; Koltchinskii et al., 2011; Negahban and Wainwright, 2012; Robin et al., 2020; Alaya
and Klopp, 2019) and methods based on a matrix factorization norm (Cai and Zhou, 2013, 2016).
4. Theoretical Results
4.1 Assumptions and Useful Quantities
We make the following Assumptions 2 and 3 throughout Section 4.
9
C HEN AND L I
Assumption 2. b1 (x) = · · · = bp (x) = b(x) for all x ∈ R. In addition, b(x) < ∞ and b00 (x) > 0
for all x ∈ R.
We note that this assumption is made for ease of presentation. It can be relaxed to allowing
functions bj to be variable-specific, and similar theoretical results hold following a similar proof. For
each α > 0, define functions κ2 (α) = sup|x|≤α b00 (x), κ3 (α) = sup|x|≤α |b(3) (x)|, and δ2 (α) =
inf |x|≤α b00 (x). Let M∗ have the SVD M∗ = U∗r D∗r (Vr∗ )T where r is the rank of M∗ , U∗r ∈ Rn×r
and Vr∗ ∈ Rp×r are the left and right singular matrices corresponding to the top-r singular values,
respectively, and D∗r ∈ Rr×r is a diagonal matrix whose diagonal elements are the singular values
σ1 (M∗ ) ≥ · · · ≥ σr (M∗ ) > 0. In order to apply the proposed methods, we need to input C2 .
R1: φ1 = · · · = φp = φ ∼ 1;
R7: (np)−1/2 eM,F (κ∗2 )−2 (δ2∗ )3 (log(np))−2 min [r−5/2 , (κ∗3 )−1 r−7/2 ]π 1/2 .
Then, with probability converging to 1, estimating equations in steps 3 and 4 of Algorithm 1 have a
unique solution and
h i
kM̃ − M∗ kmax . (δ2∗ )−2 (κ∗2 )2 (log(np))2 r5/2 {(n ∧ p)π}−1/2 + (npπ)−1/2 eM,F . (5)
Remark 6. We comment on the asymptotic requirements R1–R7. R1 requires the dispersion pa-
rameters to be the same for different j ∈ [p]. This assumption is made for ease of presentation,
and it can be easily relaxed to allowing varying values of dispersion parameters. It further requires
that the dispersion parameter is bounded as n and p grow large. R2 requires πmax and πmin to
10
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
be of the same asymptotic order. That is, the missing pattern is not too far from the commonly
adopted uniform missingness assumption where all the πij are the same (see, e.g. Candès and Tao,
2010; Davenport et al., 2014). R3 is a standard incoherent condition that is commonly assumed for
matrix completion to avoid spiky low-rank matrices (Candès and Recht, 2009; Jain et al., 2013).
R4 requires that the non-zero singular values of M∗ are in the same asymptotic order. In addi-
tion, we restrict the analysis to the case where η ≥ −1, because otherwise kM∗ kmax 1 and the
asymptotic regime is less interesting. We note that R4 can be relaxed to a more general asymptotic
regime allowing σr (M∗ ) and σ1 (M∗ ) to have different asymptotic order, and we provide the error
analysis under a more general setting in the appendix. R5 and R6 require the expected number
of non-missing observations for each row and column to be large enough. R7 requires the ini-
tial F-consistent estimator to have a sufficiently small estimation error in scaled Frobenius norm.
In Corollary 8 below, we give sufficient conditions for R5 – R7 under the three specific GLFMs
described in Section 2.
Remark 7. Let M̂CJMLE and M̂N BE denote the constrained joint maximum likelihood estima-
tor and nuclear-norm-constrained-based estimator described in Section 3.3, respectively. Also
let M̃CJMLE and M̃N BE be the corresponding refined estimators by applying Algorithm 1. The-
orem 5 indicates that with high probability kM̃CJMLE − M∗ kmax . (log(np))2 π −1 (n ∧ p)−1/2 and
kM̃NBE − M∗ kmax . (log(np))2 π −3/4 (n ∧ p)−1/4 when r is bounded, under suitable regularity
conditions. Because M̂CJMLE is asymptotically minimax when π ∼ 1 in Frobenius norm, we also
have that M̃CJMLE is asymptotically minimax in the matrix max norm.
In the following corollary, we provide sufficient conditions for R5 - R7 under specific GLFMs
discussed earlier.
Corollary 8. Assume that limn,p→∞ P(kM̂ − M∗ kF ≤ eM,F ) = 1 for some non-random eM,F .
Then, (5) holds under one of the following specific models and asymptotic requirements.
1. Data follow a binomial factor model and the following asymptotic requirements hold: R2 –
R4 and R5B: pπ (n ∨ p)0 r(3+4η)∨7 ; R6B: nπ (n ∨ p)0 r5 ; R7B: (np)−1/2 eM,F
(n ∧ p)−0 π 1/2 r−7/2 ; R8B: k1 = · · · = kp = k ∼ 1; and R9B: ρ . log(n ∧ p)1−0 for some
0 > 0.
2. Data follow a normal factor model and the following asymptotic requirements hold: R1 – R4;
R5N: pπ (log(np))3 r(1+2η)∨5 ; R6N: nπ (log(np))2 r3 ; and R7N: (np)−1/2 eM,F
(log(np))−2 π 1/2 r−5/2 .
3. Data follow a Poisson factor model and the following asymptotic requirements hold: R2 - R4,
R5B – R7B and R10P: r1+η . (log(n ∧ p))1−0 for some 0 > 0.
In the first part of the above corollary, R1 automatically holds because the dispersion parameter
φj = 1 in the binomial model.
Remark 9. We comment on the asymptotic requirements in the above corollary. R5B, R6B, R5N
and R6N require that rank r is relatively small comparing with (n ∧ p)π, and it can grow at most of
the order {(n∧p)π}ν1 for some constant ν1 ∈ (0, 1). Conditions R5B and R6B are slightly stronger
than R5N and R6N, because κ∗3 = 0 for the normal model while κ∗3 ∼ 1 for the binomial model.
Conditions R7B and R7N require the scaled Frobenius norm of the initial estimator to be small.
11
C HEN AND L I
Many F-consistent estimators, including CJMLE and NBE, have the error rate (np)−1/2 eM,F ∼
((n ∧ p)π)−ν2 for some ν2 ∈ (0, 1). For these estimators, R7B and R7N require that r .
((n ∧ p)π)ν3 π 1/2 for some ν3 ∈ (0, 1). Condition R8B requires the kj s to be the same for different
j ∈ [p] and are bounded. This condition can be easily relaxed to a more general setting with
varying but bounded kj s. Condition R9B requires that ρ grows much slower than n and p. Similar
assumptions are made for 1-bit matrix completion (Davenport et al., 2014; Cai and Zhou, 2013).
For Poisson factor models, R10P can be achieved either by an arbitrary r with η = −1 or by
r . (log(n ∧ p))(1−0 )/(1+η) with η > −1.
In particular, if we further assume that r ∼ 1, then, the asymptotic regime requirements R5, R6, and
R7’ can be simplified as pπ (log(np))3 , nπ (log(np))2 and (np)−1/2 eM,F (log(np))−2 ,
and we have that with probability converging to 1, kM̃ − M∗ kmax . (log(np))2 [{(n ∧ p)π}−1/2 +
(np)−1/2 eM,F ].
Remark 11. There are two main differences between Theorem 5 and Theorem 10. First, the asymp-
totic requirement R7 has an extra factor π 1/2 when compared with R7’. Second, the error rate
(5) has an extra π −1/2 factor when compared with (6). Thus, when π 1, Algorithm 1 requires
stronger regularity conditions and has a larger error rate. Additional results under a more general
asymptotic regime are provided in the appendix.
The following corollary give sufficient conditions for R7’ to hold under specific GLFMs.
(k)
Corollary 12. Assume that limn,p→∞ P(kM̂Nk · − M∗Nk · kF ≤ eM,F ) = 1 for some non-random
eM,F (k = 1, 2). Then, (6) holds under one of the following specific models and asymptotic re-
quirements.
1. Data follow a binomial factor model and the following asymptotic requirements hold: R2 -
R4, R5B, R6B, R8B, R9B, and R7’B: (np)−1/2 eM,F (n ∧ p)−0 r−7/2 for some 0 > 0.
2. Data follow a normal factor model and the following asymptotic requirements hold: R1 - R4,
R5N, R6N, and R7’N: (np)−1/2 eM,F (log(np))−2 r−5/2 .
3. Data follow a Poisson factor model and that asymptotic requirements R2 - R4, R5B,
R6B,R7’B, and R10P hold.
Remark 9 still applies to Corollary 12, except that now we have a better rate when π is close to
zero.
12
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Setting n p r π Setting n p r π
1 400 200 3 0.6 4 400 200 3 0.2
2 800 400 3 0.6 5 800 400 3 0.2
3 1600 800 3 0.6 6 1600 800 3 0.2
Table 2: Simulation settings. All the variables are ordinal (with kj = 5), for which the Binomial
model is assumed.
5. Simulation Study
We evaluate the proposed methods via a simulation study. Eight estimation procedures are consid-
ered as listed in Table 1. For Algorithm 3, five data splittings are performed. These procedures are
applied under 24 simulation settings, where n, p, r, πmax = πmin = π, and variable types are varied.
Settings 1-6 are listed in Table 2, where all the variables follow Binomial distribution with kj = 5.
The rest of the settings and additional details on data generation can be found in the appendix. For
each simulation setting, 100 simulations are conducted.
The procedures are evaluated under two loss functions, the scaled Frobenius norm kM̂ −
√
∗
M kF / np and the max norm kM̂ − M∗ kmax . The results for Settings 1-6 are given in Fig-
ures 1 and 2, and those for the other settings show similar patterns and are given in the appendix.
First, for each procedure and given r and π, both the scaled Frobenius norm and the max norm
decay as n and p grow simultaneously. Second, comparing the two figures, we see that the error
rates are larger under Settings 4-6 than those under Settings 1-3 given the same n, p, and r, as
the proportion of missing entries is higher under Settings 4-6. Third, Procedure 1 (i.e., NBE with
no refinement) has larger error rates than its refined versions (Procedures 2-4), suggesting that the
refinement procedures reduce the error of the initial NBE. Fourth, we see that Procedures 5 and
6 perform similarly, which is expected as they are asymptotically equivalent, as discussed in Re-
mark 7. Fifth, comparing Procedures 2 and 6, we see that the refined NBE and the refined CJMLE
have very similar performance. Similar patterns are observed when comparing Procedures 3 and
7 and when comparing Procedures 4 and 8. At first glance, it may seem a little counter-intuitive.
According to Theorems 5 and 10, the error in the max norm of a refined estimator is upper bounded
by the error in the scaled Frobenius norm of its initial estimator, and thus, we would expect the
CJMLE-based refinements to have smaller errors in the max norm than the NBE-based refinements.
The pattern under the current settings may be explained by the SVD steps in Algorithms 1, 2, and
3 that project the initial estimate to the space of rank-r matrices. Under these settings, the initial
NBE after projection tends to approximate the CJMLE. We note that this is not always the case
under other settings. Under settings 23 and 24 (see their results in the appendix), the CJMLE tends
to outperform the projected NBE, and thus, the CJMLE-based refinements tend to outperform the
13
C HEN AND L I
NBE-based refinements. Finally, comparing within Procedures 2-4 and comparing within Proce-
dures 6-8, we see that Algorithm 1 leads to better empirical performance regardless of the value of
π, even though Algorithm 2 has a faster theoretical convergence speed when π approaches 0. We
conjecture that for CJMLE and NBE, the resulting  in Step 2 of Algorithm 1 does not have a high
dependence with any rows of Ω when ωij s are uniformly sampled, and thus, the upper bound in (5)
may be improved in this case. We also observe that Algorithm 3 outperforms Algorithm 2 through
aggregating results from multiple runs Algorithm 2. By running Algorithm 2 five times, Algorithm
3 has a similar performance as Algorithm 1.
0.5
0.5
0.4
0.4
0.4
Scaled Frobenius norm
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
3.0
3.0
2.5
2.5
2.5
2.0
2.0
2.0
Max norm
Max norm
Max norm
1.5
1.5
1.5
1.0
1.0
1.0
0.5
0.5
0.5
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 1: Results from Simulation Settings 1-3. The panels on the first row show the results based
on the scaled Frobenius norm, and those on the second row show the results based on the max norm.
In each panel, the box plots show the results of the eight procedures in Table 1, each constructed
from 100 independent simulations.
14
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
0.7
0.7
0.7
0.6
0.6
0.6
Scaled Frobenius norm
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
5
4
4
Max norm
Max norm
Max norm
3
3
2
2
1
1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 2: Results from Simulation Settings 4-6. The plots can be interpreted similarly as those in
Figure 1.
Procedure Index
Rank 1 2 3 4 5 6 7 8
1 -48928 -49247 -49397 -49253 -49256 -49266 -49266 -49163
2 -53201 -49505 -49767 -48875 -48437 -48493 -48654 -48341
3 -56091 -49284 -49754 -48570 -49022 -49217 -48837 -48207
4 -56235 -49633 -50037 -48611 -51192 -51986 -49174 -48271
Table 3: Test-set log-likelihoods for the MovieLens data. The eight procedures are listed in Table 1.
M. A larger log-likelihood function value implies a higher prediction accuracy. The results are
given in Table 3. The refinement methods improve the test-set log-likelihood of the NBE when
r = 2, 3, 4 but not when r = 1, likely due to the rank-one model being too restrictive for the current
data. Turning to the results from the CJMLE and its refinements, we see that Procedures 5 and 6 tend
to perform similarly. We also see that Procedure 8, which is a refinement of CJMLE by Algorithm
3, tends to improve the test-set log-likelihood of CJMLE under all values of r. Procedure 7 also
performs fine, despite its relatively high variance brought by performing data splitting only once in
Algorithm 2. The good performance of Procedures 7 and 8 is likely due to that the distribution of the
data missingness indicators ωij is far from a uniform distribution. Instead, their distribution likely
depends on the true signal matrix (i.e., people may be more likely to have watched movies that they
like), which may lead to dependence between the initial estimate  and some rows of Ω when data
splitting is not performed. Such dependence leads to a larger estimation error. The largest test-set
log-likelihood is given by Procedure 8 (i.e., CJMLE refined by Algorithm 3) when r = 3.
15
C HEN AND L I
Procedure Index
Rank 1 2 3 4 5 6 7 8
1 -67205 -67938 -67958 -67921 -67587 -67516 -68204 -68140
2 -71620 -68556 -68733 -67749 -63250 -63313 -64914 -64842
3 -75816 -70092 -70067 -69151 -65476 -65370 -68611 -67693
4 -77632 -72365 -72238 -71640 -72320 -72648 -79466 -75989
Table 4: Test-set log-likelihoods for the PISA data. The eight procedures are listed in Table 1.
7. Discussions
This note concerns matrix completion for mixed data under a GLFM framework. It proposes entry-
wise consistent methods for estimating GLFMs based on a partially observed data matrix. Proba-
16
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
bilistic error bounds are established for the matrix max norm under sensible asymptotic regimes (see
Section 4), and they are extended under a more general asymptotic regime in the appendix. These
error bounds imply the entrywise consistency and, further, characterize the asymptotic behaviors
of the proposed methods. With these error bounds, optimal results are established under suitable
asymptotic regimes. The proposed procedures are applied to two real data examples, one on movie
recommendation and the other on large-scale educational assessment. For the movie recommenda-
tion example, the best predictive model is a rank-three model obtained by refining the CJMLE with
Algorithm 3. For the educational assessment example, a rank-two model given by the CJMLE turns
out to be the most predictive one.
The current work can be extended in several directions. First, some popular factor models,
such as the probit model for binary data considered in Davenport et al. (2014), are not exponential
family GLFMs. We believe that our refinement procedures and their theory can be extended to
many other models beyond the exponential family GLFM. This is because the theoretical properties
of these procedures mainly rely on the convexity of the loss function with respect to M, which still
holds under many other non-linear factor models. Second, the optimal rate for estimating GLFMs
is worth future investigation. We currently do not know whether our upper bounds are minimax
optimal when the dimension r diverges. Sharp lower bounds need to be developed to answer this
question.
Acknowledgments
Appendix
This appendix provides additional theoretical results, proof of the theorems, and additional sim-
ulation results.
kM̃ − M∗ kmax ≤ max(kΘ̃N1 (Ã(1) )T − M∗N1 · kmax , kΘ̃N2 (Ã(2) )T − M∗N2 · kmax ).
We will provide detailed analysis for kΘ̃N1 (Ã(1) )T − M∗N1 · kmax . The analysis of kΘ̃N2 (Ã(2) )T −
M∗N2 · kmax is similar and is thus omitted. For the ease of presentation, we drop the superscript (1) in
Â(1) when the context is clear. Recall that M∗ has the SVD M∗ = U∗r D∗r (Vr∗ )T where U∗r ∈ Rn×r ,
Vr∗ ∈ Rp×r denote the left and right singular matrices, and D∗r = diag(σ1 (M∗ ), · · · , σr (M∗ )).
The rest of the section is organized as follows. In Section A.1, we obtain an error bound for kÂ−
A∗ kF where A∗ = Vr∗ P̂ for a carefully chosen orthogonal matrix P̂. In Section A.2, we provide
non-asymptotic and non-probabilistic bounds for solutions to the non-linear estimation equations
used in Step 3 and 4 in the proposed Algorithm 2. In Section A.3, we obtain non-asymptotic
17
C HEN AND L I
probabilistic bounds for terms involved in Section A.2. In Section A.4, we put together results
in Sections A.1 – A.3 and obtain asymptotic error bounds for kΘ̃N2 − Θ∗ k2→∞ (Lemma 38),
kà − A∗ k2→∞ (Lemma 39), and kΘ̃N1 (Ã(1) )T − M∗N1 · kmax (Lemma 40) where Θ∗ = U∗r D∗r P̂.
Finally, we provide additional theoretical results for Algorithm 2 in Section A.5 and the proof of
Theorem 10 in Section A.6.
Throughout the analysis, for real number operators, we calculate multiplication and division
before the max and min operators (‘∨’ and ‘∧0 ) unless otherwise specified. For example, u(xy ∨
z/w) = u max(xy, z/w) for real numbers x, y, u, w, z. For two events A and B, we say ‘event A
has probability at least 1 − on event B’, if P(Ac ∩ B) ≤ . Note that P(A) ≥ 1 − − P(B c ) in
this case.
Proof [Proof of Lemma 13] According to Weyl’s inequality and the assumption that kM̂N1 · −
M∗N1 · k2 ≤ 2−1 ψr , σr (M̂N1 · ) ≥ σr (M∗N1 · ) − kM̂N1 · − M∗N1 · k2 ≥ 2−1 σr (M∗N1 · ) ≥ 2−1 ψr . Thus
the gaps of singular value satisfies
h i n o
min min {σi (M̂N1 · )−σj (M∗N1 · )}, min σi (M̂N1 · ) = min σr (M̂N1 · ), σr (M∗N1 · ) ≥ 2−1 ψr .
1≤i≤r,j>r 1≤i≤r
(8)
∗
Let Vr,N ∈ R p×r be the right singular value matrix corresponding to the top-r singular values of
1·
M∗N1 · and
P† = arg minkV̂r − Vr,N ∗
1·
PkF , (9)
P∈Or
where Or denotes the set of all r × r orthogonal matrices. According to the above equations and
Wedin’s sine angle theorem (Wedin, 1972),
∗
2kM̂N1 · − M∗N1 · kF 4kM̂N1 · − M∗N1 · kF
kV̂r − Vr,N 1·
P† kF = inf kV̂r − Vr,N
∗
1·
PkF ≤ ≤ .
P∈Or σr (M̂N1 · ) ψr
(10)
On the other hand, since σr (M∗N1 · ) ≥ ψr > 0, the column space of (M∗N1 · )T is the same as
∗
the columns space of Vr,N and that of Vr∗ . This implies that there exists an orthogonal matrix
1·
P̄ ∈ R r×r such that Vr,N1 · = Vr∗ P̄, which further implies that for the orthogonal matrix
∗
P̂ = P̄P† , (11)
we have kV̂r − Vr∗ P̂kF ≤ 4ψr−1 kM̂N1 · − M∗N1 · kF . According to Algorithm 2, Â is the projection
of V̂r to the set {A ∈ Rp×r : kAk2→∞ ≤ C2 } and kVr∗ P̂k2→∞ = kVr∗ k2→∞ ≤ C2 . Thus,
k − Vr∗ P̂kF ≤ k − V̂r kF +kV̂r − Vr∗ P̂kF ≤ 2kV̂r − Vr∗ P̂kF ≤ 8ψr−1 kM̂N1 · − M∗N1 · kF . (12)
18
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
The next lemma provides a non-probabilistic bound for the solution to the partial score equation
S1,i (θi , A) = 0r .
Lemma 15. Let Θ∗ ∈ Rn×r and A∗ ∈ Rp×r be such that M∗ = Θ∗ (A∗ )T and Z =
(zij ) with zij = yij − b0 (m∗ij ) and diag(Ωi· ) := diag(ωi1 , · · · , ωip ). If kΘ∗ k2→∞ ≤ C1 ,
kA∗ k2→∞ , kAk2→∞ ≤ C2 and there exists ξ > 0 such that
2σr−1 (I1,i (A)){kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 (C2 (C1 + ξ))}
(15)
≤ξ ≤ 2−1 {γ1,i (A)κ3 (C2 (C1 + ξ)}−1 σK (I1,i (A)),
where we define Zi· = (zij )j∈[p] ∈ R1×p ,
p
X
B1,i (A) := ωij b00 (m∗ij )aj (aj − a∗j )T θi∗ ∈ Rr , (16)
j=1
p
X
I1,i (A) := ωij b00 (m∗ij )aj (aj )T , (17)
j=1
and
X X
β1,i (A) := sup ωij ((aj − a∗j )T θi∗ )2 |aTj u| and γ1,i (A) := sup ωij |aTj u|3 , (18)
kuk=1 j kuk=1 j
then, there is θ̃i such that kθ̃i − θi∗ k≤ ξ and S1,i (θ̃i ; A) = 0.
Proof [Proof of Lemma 15] Let θ be a vector such that kθ − θi∗ k= ξ and let mij = aTj θi . Consider
the Taylor expansion of φS1,i (θ; A),
X X
φS1,i (θ; A) = ωij (yij − b0 (m∗ij ))aj − ωij (b0 (mij ) − b0 (m∗ij ))aj
j j
X X
=AT diag(Ωi· )ZTi· − ωij b00 (m∗ij )(mij − m∗ij )aj − 2−1 ωij b(3) (m̃ij )(mij − m∗ij )2 aj ,
j j
(19)
19
C HEN AND L I
for some m̃ij between m∗ij and mij . Plugging mij − m∗ij = aTj (θ − θi∗ ) + (aj − a∗j )T θi∗ into the
above display, we obtain
X
φS1,i (θ; A) =AT diag(Ωi· )ZTi· − ωij b00 (m∗ij )aj aTj (θ − θi∗ )
j
X X . (20)
00
− ωij b (m∗ij )aj (aj − a∗j )T θi∗ − 2−1 ωij b(3) (m̃ij )(mi − m∗ij )2 aj
j j
Recall that kθ − θi∗ k= ξ. Using inequalities about matrix products and singular values, we have the
following upper bounds for the first three terms on the right-hand side of the above display.
|(θ − θi∗ )T AT diag(Ωi· )ZTi· |≤ ξkAT diag(Ωi· )ZTi· k= ξkZi· diag(Ωi· )Ak, (22)
X
− (θ − θi∗ )T ωij b00 (m∗ij )aj aTj (θ − θi∗ ) ≤ −ξ 2 σr (I1,i (A)), (23)
j
where σr (I1,i (A)) denotes the r-th largest singular value of I1,i (A), and
X
|(θ − θi∗ )T ωij b00 (m∗ij )aj (aj − a∗j )T θi∗ |= k(θ − θi∗ )T B1,i k≤ ξkB1,i k. (24)
j
Now we analyze the last term 2−1 (θ − θi∗ )T j ωij b(3) (m̃ij )(mi − m∗ij )2 aj . Note that |m̃ij |≤
P
|mij ∗ |∨|mij |≤ (C1 + ξ)C2 and mij − m∗ij = aTj (θ − θi∗ ) + (aj − a∗j )T θi∗ , we have
X
2−1 (θ − θi∗ )T b(3) (m̃ij )(mi − m∗ij )2 aj
j
X
≤2−1 κ3 ((C1 + ξ)C2 )ξ sup ωij ((aj − a∗j )T θi∗ + ξaTj u)2 |aTj u|
kuk=1 j (25)
X X
≤κ3 ((C1 + ξ)C2 ){ξ sup ωij ((aj − a∗j )T θi∗ )2 |aTj u|+ξ 3 sup ωij |aTj u|3 }
kuk=1 j kuk=1 j
Combining the analysis with (21), (22), (23), and (24), we obtain
20
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Now, we view the right-hand side of the above inequality as a cubic function in ξ. For any cubic
function f (x) = −ax2 + bx3 + cx with a, b, c > 0, it is easy to verify that if 2c/a ≤ x ≤ a/(2b),
then f (x) ≤ 0. Applying this result, we can see that supkθ−θi∗ k=ξ (θ − θi∗ )T S1,i (θ; A) ≤ 0, if the
following inequalities hold:
−1
2σK (I1,i (A)){kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 ((C1 + ξ)C2 )}
(27)
≤ξ ≤ 2−1 {γ1,i κ3 ((C1 + ξ)C2 )}−1 σr (I1,i (A)).
According to Result 6.3.4 in Ortega and Rheinboldt (2000), supkθ−θi∗ k=ξ (θ − θi∗ )T S1,i (θ; A) ≤ 0
implies that there is a solution S1,i (θ̃; A) = 0 satisfying kθ̃ − θi∗ k≤ ξ.
Next, we simplify the result of Lemma 15 to obtain a more user-friendly version in the next lemma.
Lemma 16. Let Θ∗ ∈ Rn×r and A∗ ∈ Rp×r be such that M∗ = Θ∗ (A∗ )T and Z = (zij ) with
zij = yij − b0 (m∗ij ). If kA∗ k2→∞ ≤ C2 and kAk2→∞ ≤ C2 , and
kθ̃i − θi∗ k≤ 2σr−1 (I1,i (A)){kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 (3C1 C2 )}. (29)
2σr−1 (I1,i (A)){kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 (C2 (C1 + ξ))}
(31)
≤2σr−1 (I1,i (A)){kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 (3C1 C2 )}.
2σr−1 (I1,i (A)){kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 (C2 (C1 + ξ))} ≤ ξ. (32)
On the other hand, according to the assumption that kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 (3C1 C2 ) ≤
−1
2−2 γ1,i (κ3 (3C1 C2 ))−1 σr2 (I1,i (A)), we further have
21
C HEN AND L I
Equations (32) and (33) together imply (15). By Lemma 15, there is θ̃i such
that kθ̃i − θi∗ k≤ ξ and S1,i (θ̃; A) = 0. We complete the proof by noting that
ξ = 2σr−1 (I1,i (A)){kZi· diag(Ωi· )Ak+kB1,i (A)k+β1,i (A)κ3 (3C1 C2 )} ≤ 2σr−1 (I1,i ) ·
2−1 σr (I1,i (A))C1 = C1 .
By symmetry, we also have the following non-probabilistic and non-asymptotic analysis for Ã.
For each j ∈ [p], the estimating equation for aj based on ΘN2 and ΩN2 · is defined as
X
S2,j (aj ; ΘN2 ) := φ−1 ωij {yij − b0 (aTj θi )}θi . (34)
i∈N2
Let X
B2,j (ΘN2 ) = ωij b00 (m∗ij )θi (θi − θi∗ )T a∗j ∈ Rr , (35)
i∈N2
X
I2,j (ΘN2 ) = ωij b00 (m∗ij )θi (θi )T , (36)
i∈N2
and
X X
β2,j (ΘN2 ) = sup ωij ((θi − θi∗ )T a∗j )2 |θjT u| and γ2,j (ΘN2 ) = sup ωij |θiT u|3 , (37)
kuk=1 i∈N kuk=1 i∈N
2 2
Lemma 17. Let Θ∗N2 and A∗ be such that M∗N2 · = Θ∗N2 (A∗ )T and Z = (zij ) with zij = yij −
b0 (m∗ij ) and diag(ΩN2 ,j ) := diag((ωij )i∈N2 ). If kΘN2 k, kΘ∗N2 k2→∞ ≤ C1 , kA∗ k2→∞ ≤ C2 and
where ZN2 ,j = (zij )i∈N2 , then, there is ã such that S2,j (ã; ΘN2 ) = 0r , and
kãj −a∗j k≤ 2σr−1 (I2,j (ΘN2 ))){kZTN2 ,j diag(ΩN2 ,j )ΘN2 k+kB2,j (ΘN2 )k+β2,j (ΘN2 )κ3 (3C1 C2 )}.
(39)
∗
Moreover, ãj satisfies that kãj − aj k≤ C2 .
Proof [Proof of Lemma 17] The lemma follows similar proof as that of Lemma 15 and Lemma 16
with (A, A∗ , C1 , C2 ) replaced by (ΘN2 , Θ∗N2 , C2 , C1 ). We omit the details.
22
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
P (40)
where pmax = maxi∈[n] j ωij denotes the maximum number of observations in each row.
Proof [Proof of Lemma 18] We first verify that under the generalized latent factor model,
Zi· diag(Ωi· )·k is sub-exponential given ΩN2 · = (ωij )i∈N2 ,j∈[p] and Â. To see this, consider
the moment generating function
E[exp(λZi· diag(Ωi· )·k )|ΩN2 · , Â]
Y
= E[λZij âjk ωij |ΩN2 · , Â]
j∈[p]
(41)
h X i
= exp φ−1 ωij {b(m∗ij + λâjk φ) − b(m∗ij ) − λâjk φb0 (m∗ij )}
j
X
−1 2
= exp[2 λ φ ωij b00 (m̃ij )(âjk )2 ]
j
for some m̃ij between m∗ij and m∗ij + λâjk φ. Note that here we used the independence between Â
and {zij ωij }i∈N2 in the first and second equations.
Because |m∗ij |≤ ρ and |âjk |≤ C2 , for |λ|≤ (ρ + 1)/(φC2 ), m̃ij ≤ ρ + λφC2 ≤ 2ρ + 1.
Thus, E[exp(λZi· diag(Ωi· )·k )|ΩN2 · , Â] ≤ exp{λ2 φ j ωij (âjk )2 κ2 (2ρ + 1)/2} for |λ|≤ (ρ +
P
1)/(φC2 ). This implies that ZP i· diag(Ωi· )·k is sub-exponential (conditional on (ΩN2 · , Â)) with
parameters νik = φκ2 (2ρ + 1) j ωij (âjk )2 ≤ C22 φκ2 (2ρ + 1)pmax and α = φC2 /(ρ + 1).
2
Applying tail probability bound for sub-exponential random variables to Zi· diag(Ωi· )·k , we
have
2 2
P(|Zi· diag(Ωi· )·k |≥ t|ΩN2 · , Â) ≤ 2(e−t /(2νik ) ∨ e−t/(2α) ) (42)
for all positive t. This implies
P(kZi· diag(Ωi· )Âk≥ t|ΩN2 · , Â)
X √
≤ P(|Zi· diag(Ωi· )·k |≥ t/ r|ΩN2 · , Â)
(43)
k∈[r]
2 /(2r max 2 ) 1/2 α)
≤r · 2(e−t k νik
∨ e−t/(2r ).
Combining results for different i with a union bound, we have
2 2 1/2
P maxkZi· diag(Ωi· )Âk≥ t|ΩN1 · , Â ≤ 2rn · (e−t /(2r maxk νik ) ∨ e−t/(2r α) ). (44)
i∈N2
For t = {8(log(nr)r maxk∈[r] νik 2 )1/2 } ∨ 8r 1/2 α log(nr) and n ≥ 2, the right-hand side of the
23
C HEN AND L I
Lemma 19 (Upper bound for kB1,i (Â)k with data splitting). Let A∗ = Vr∗ P̂ and Θ∗ = U∗r D∗r P̂.
If  is independent with {ωij }j∈[p] for i ∈ N2 , kÂk2→∞ , kVr∗ k2→∞ ≤ C2 and kUr D∗r k2→∞ ≤ C1 ,
then, for n ≥ 4 with probability at least 1 − 1/(nr),
Proof [Proof of Lemma 19] First, by the assumptions and P̂ is orthogonal, kΘ∗ k2→∞ =
kU∗r D∗r k2→∞ ≤ C1 and kA∗ k2→∞ = kVr∗ k2→∞ ≤ C2 . Let
Then,
p
X X X
B1,i (Â) = ωij b00 (m∗ij )âj (âj − a∗j )T θi∗ = Sj + πij b00 (m∗ij )âj (âj − a∗j )T θi∗ . (48)
j=1 j∈[p] j∈[p]
Note that Sj are independent mean zero random vectors for j ∈ [p] (conditional on Â) and
This
P allow us rto apply the matrix Bernstein inequality (Equation (6.1.5) in Tropp (2015)) to
j∈[p] Sj ∈ R , and obtain
3t2 3t2
X 3t 3t
P k Sj k≥ t|Â ≤ (r + 1) · e− 8ν ∨ e− 8L ≤ 2r · e− 8ν ∨ e− 8L (50)
j∈[p]
n P o
T T
P
for t > 0 where ν = max j∈[p] E{Sj Sj |Â} , j∈[p] E{Sj Sj |Â} and L =
2 2
4κ∗2 C1 C22 ≥ kSj k for all j. Thus, for any 0 < < r
X
P k Sj k≥ {8/3 · log(2r/)}1/2 ν 1/2 ∨ {(8/3 · log(2r/))L}|Â ≤ . (51)
j∈[p]
E{Sj STj |Â} = πij (1 − πij ) · {b00 (m∗ij )}2 âj (âj − a∗j )T θi∗ (θi∗ )T (âj − a∗j )âTj , (52)
and
E{STj Sj |Â} = πij (1 − πij ) · {b00 (m∗ij )}2 (θi∗ )T (âj − a∗j )âTj âj (âj − a∗j )T θi∗ , (53)
we have
n o
max kE{STj Sj |Â}k2 , kE{Sj STj |Â}k2 ≤ πmax (κ2 (ρ))2 C12 C22 kâj − a∗j k2 (54)
24
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
which implies
n X X o
ν = max E{Sj STj |Â} , E{STj Sj |Â} ≤ πmax (κ2 (ρ))2 C12 C22 k − A∗ k2F .
2 2
j∈[p] j∈[p]
(55)
Combine the above inequality with (51), we have that with probability at least 1 − ,
X
k Sj k≤ {8/3 · log(2r/)}1/2 πmax
1/2
κ2 (ρ)C1 C2 k − A∗ kF +{(8/3 · log(2r/))} · 4κ2 (ρ)C1 C22
j∈[p]
(56)
for any 0 < < r. Simplifying this inequality, we get that with probability at least 1 − ,
X
k 1/2
Sj k≤ {16 · log(r/)} · (πmax κ2 (ρ)C1 C2 k − A∗ kF +κ2 (ρ)C1 C22 ) (57)
j∈[p]
X
k πij b00 (m∗ij )âj (âj − a∗j )T θi∗ k
j∈[p]
X
≤C1 k πij b00 (m∗ij )âj (âj − a∗j )T k2
(58)
j∈[p]
=C1 kÂT diag(πi1 b00 (m∗i1 ), · · · , πip b00 (m∗ip ))(Â − A∗ )k2
≤C1 kÂk2 πmax κ∗2 k − A∗ kF
Combine the above inequality with (48) and (57), we have
Remark 20. The first term κ∗2 πmax C1 kÂk2 k − A∗ kF in the upper bound is the leading term in
the error analysis. To obtain this error bound, we need {ωij }j∈[p] to be independent with Â. In
contrast, if {ωij }j∈[p] are dependent with Â, then the the leading term in the error analysis may be
√
larger (at the order 1/ πmax in the worst case).
Lemma 21 (Upper bound for β1,i (Â) with data splitting). If kU∗r D∗r k2→∞ ≤ C1 ,
kÂk2→∞ , kVr∗ k2→∞ ≤ C2 , and  is independent with {ωij }i∈N2 ,j∈[p] , then, with probability at
least 1 − 1/n,
25
C HEN AND L I
Conditional on Â, (ωij − πij )kâj − a∗j k2 are independent, mean-zero, bounded by 4C22 , and has the
variance πij (1−πij )kâj −a∗j k4 ≤ 4πij C22 kâj −a∗j k2 . By Bernstein’s inequality for bounded random
variables (Theorem 2.10 in Boucheron et al. (2013) with c = 4C22 /3 and v = 4πij C22 k − A∗ k2F ),
for t > 0
X
P (ωij − πij )kâj − a∗ k2 ≥ (8πij C22 k − A∗ k2F t)1/2 + 4/3 · C22 t| ≤ e−t . (62)
j∈[p]
Let t = 2 log(n) in the above inequality and note that πij ≤ πmax and 4/3 < 2, we have that with
probability at least 1 − 1/n2 ,
X
(ωij − πij )kâj − a∗ k2 ≤ 4πmax
1/2
C2 (log(n))1/2 k − A∗ kF +4C22 log(n). (63)
j∈[p]
We complete the proof by combining the above inequality with (61) and applying a union bound
for i ∈ N2 .
Remark 22. Similar to Remark 20, the above analysis also requires the independence of {ωij }j∈[p]
and  in order to obtain the leading term C12 C2 πmax k − A∗ k2F .
Lemma 23 (Upper bound for pmax ). Recall pmax = maxi∈[n] pi . If pπmax ≥ 6 log n, then
E(ωij − pij )2 =
P P P
Because j j V ar(ωij ) ≤ j πij ≤ pπmax , the above inequality implies,
n (pπmax )2 /2 o 3
P(pi − E(pi ) ≥ pπmax ) ≤ exp − = exp ( − pπmax ), (67)
(pπmax ) + (pπmax )/3 8
26
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
where the last inequality is due to the assumption that pπmax ≥ 6 log n > 16/3 log n.
Lemma 24 (Upper bound of γ1,i (Â)). If kÂk2→∞ ≤ C2 and pπmax > 6 log n, then with probability
at least 1 − 1/n,
γ1,i (Â) ≤ 2pπmax C23 . (70)
Proof [Proof of Lemma 24] The lemma follows by Lemma 23 and the following inequality
p
X
γ1,i (Â) = sup ωij |âTj u|3 ≤ pmax C23 . (71)
kuk=1 i=1
The next three lemmas together give a lower bound for σr (I1,i (Â))
Lemma 25. If kdiag(Ωi· )(Â − A∗ )k2 ≤ 2−1 σr (diag(Ωi· )A∗ ) and kM∗ kmax ≤ ρ, then
This implies σr (I1,i (Â)) ≥ δ2 (ρ)σr2 (diag(Ωi· )Â). By Weyl’s inequality, σr (diag(Ωi· )Â) ≥
σr (diag(Ωi· )A∗ )−kdiag(Ωi· )(Â−A∗ )k2 . Thus, if kdiag(Ωi· )(Â−A∗ )k2 ≤ 2−1 σr (diag(Ωi· )A∗ ),
then σr (diag(Ωi· )Â) ≥ 2−1 σr (diag(Ωi· )A∗ ), and thus,
σr (I1,i (Â)) ≥ δ2 (ρ)σr2 (diag(Ωi· )Â) ≥ 2−2 δ2 (ρ)σr2 (diag(Ωi· )A∗ ). (74)
The next two lemmas give a lower bound for σr (diag(Ωi· )A∗ ) and an upper bound for
kdiag(Ωi· )(Â − A∗ )k2 .
Lemma 26. Let A∗ = Vr∗ P̂ and let Π1,i = diag(πi1 , · · · , πip ) = E(diag(Ωi· )) and λ∗i,min =
λr ((Vr∗ )T Π1,i Vr∗ ) = λr ((A∗ )T Π1,i A∗ ), where λr (·) denotes the r-th largest eigenvalue of a sym-
metric matrix. If λ∗min := mini∈[n] λ∗i,min ≥ 16kVr∗ k22→∞ log(nr), then
P min σr2 (diag(Ωi· )A∗ ) ≤ 2−1 λ∗min ≤ 1/(nr) (75)
i∈[n]
27
C HEN AND L I
Remark 27. In the ‘moreover part’ of the above lemma, σr2 (A∗ ) = σr2 (Vr∗ P̂) = 1, so it is possible
to further simplify the statement of lemma. We keep the current form without simplification so that
similar results can be obtained by symmetry for Θ∗ = U∗r D∗r P̂, which will be useful for the analysis
later.
Proof [Proof of Lemma 26] First note that σr2 (diag(Ωi· )A∗ ) = σr2 (diag(Ωi· )Vr∗ P̂) =
σr2 (diag(Ωi· )Vr∗ ) = λr ((Vr∗ )T diag(Ωi· )Vr∗ ). Also note that for all t ∈ (0, 1)
P σr2 (diag(Ωi· )Vr∗ ) ≤ (1 − t)λ∗i,min
X X (77)
=P λr ( ωij vj∗ (vj∗ )T ) ≤ (1 − t) · λr ( πij vj∗ (vj∗ )T ) ,
j j
where vj∗ ∈ Rr denotes the j-th row of Vr∗ . Note that λr {E( j∈[p] ωij vj∗ (vj∗ )T )} = λ∗i,min ,
P
λ1 (ωij vj∗ (vj∗ )T ) ≤ kVr∗ k22→∞ , and ωij vj∗ (vj∗ )T are independent for different j. Applying Remark
5.3 in Tropp (2012) to the above probability, we obtain that for all t ∈ (0, 1),
X n o
P λr ( ωij vj∗ (vj∗ )T ) ≤ (1 − t) · λ∗i,min ≤ r exp − 2−1 kVr∗ k−22→∞ (1 − t)2 ∗
λi,min . (78)
j
Thus,
P σr2 (diag(Ωi· )A∗ ) ≤ (1 − t)λ∗i,min ≤ r exp { − 2−1 kVr∗ k−2 2 ∗
2→∞ (1 − t) λi,min }. (79)
Apply a union bound to the above inequality for different i ∈ [n], we obtain
P( min σr2 (diag(Ωi· )A∗ ) ≤ 2−1 λ∗min ) ≤ nr exp { − 8−1 kVr∗ k−2 ∗
2→∞ λmin }. (82)
i∈[n]
The right-hand side of the above inequality is no greater than (nr)−1 when λ∗min ≥
16kVr∗ k22→∞ log(nr) = 16kA∗ k22→∞ log(nr).
The ‘moreover’ part of the lemma is proved by noting that λ∗i,min = λr ( j∈[p] πij a∗j (a∗j )T ) ≥
P
Lemma 28. If kÂk2→∞ , kVr∗ k2→∞ ≤ C2 and  is independent with {ωij }i∈N2 ,j∈[p] , then with
probability at least 1 − 1/(nr),
for n ≥ 4.
28
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
X n 3t2 3t o
P k (ωij − πij )∆aj ∆Taj k2 ≥ t|Â ≤ 2r · exp − ∧ (84)
8ν 8L
j∈[p]
where ν = 4πmax C22 k∆A k2F ≥ j∈[p] kE[{ωij ∆aj ∆Taj }T ωij ∆aj ∆Taj ]k and L = 4C22 ≥ k(ωij −
P
Now we give an upper bound for t = [{8/3 · log(2r/)}1/2 ν 1/2 ] ∨ [{8/3 · log(2r/)}L] for ∈
(0, r/10)
for ∈ (0, r/10). Applying a union bound to the above result with = 1/(rn2 ), we have
X
k (ωij − πij )∆aj ∆Taj k2 ≤ 64 log(n) · {(πmax
1/2
C2 k∆A kF ) ∨ C22 } (88)
j∈[p]
Xp Xp
T
λ1 ( πij ∆aj ∆aj ) ≤ πmax λ1 ( ∆aj ∆Taj ) = πmax k∆A k22 ≤ πmax k∆A k2F . (89)
j=1 j=1
Combining the above two inequalities and note that kdiag(Ωi· )(Â − A∗ )k22 =
λ1 ( j∈[p] ωij ∆aj ∆Taj ), we obtain that with probability at least 1 − 1/(nr),
P
for n ≥ 4.
29
C HEN AND L I
Lemma 31 (Upper bound for kZTN2 ,j diag(ΩN2 ,j )Θ̃N2 k). Assume that nπmax ≥ 6 log(p). With
probability at least 1 − 3/p − P(kΘ̃N2 k2→∞ > 2C1 ),
maxkZTN2 ,j diag(ΩN2 ,j )Θ̃N2 k
j∈[p]
≤16{φ1/2 (κ∗2 )1/2 C1 log1/2 (pr)r1/2 (nπmax )1/2 ∨ r1/2 φC1 /(ρ + 1) log(pr)} (92)
30
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Combine the above display with Lemma 29 and Lemma 30, we have that with probability at least
1 − 3/p,
≤16{φ1/2 (κ∗2 )1/2 C1 log1/2 (pr)r1/2 (nπmax )1/2 ∨ r1/2 φC1 /(ρ + 1) log(pr)} (96)
Lemma 32 (Upper bound for kB2,j (Θ̃N2 )k). Assume that nπmax ≥ 6 log(p). With probability at
least 1 − 1/p,
maxkB2,j (Θ̃N2 )k≤ 4C1 C2 κ∗2 nπmax kΘ̃N2 − Θ∗N2 k2→∞ , (97)
j∈[p]
maxkB2,j (Θ̃N2 )k≤ 4C1 C2 κ∗2 nπmax kΘ̃N2 − Θ∗N2 k2→∞ (99)
j∈[p]
Lemma 33. Assume that nπmax ≥ 6 log(p). With probability at least 1 − 1/p,
Lemma 34. Assume that nπmax ≥ 6 log(p). With probability at least 1 − 1/p,
31
C HEN AND L I
Lemma 35. Assume that P(kΘ̃N2 − Θ∗N2 k2→∞ ≤ eΘ,2→∞ ) ≥ 1 − for some non-random
eΘ,2→∞ , nπmax ≥ 6 log(p), πmin σr2 (Θ∗N2 ) ≥ 32kΘ∗N2 k22→∞ log(p), p ≥ r, and 2e2Θ,2→∞ nπmax ≤
2−3 πmin σr2 (Θ∗N2 ). Then, with probability at least 1 − 2/p −
I2,j (Θ̃N2 ) ≥ 2−2 δ2 (ρ)πmin σr2 (Θ∗ ) ≥ 2−2 δ2 (ρ)πmin ψr2 (104)
Proof [Proof of Lemma 35] First note that
X
kdiag(ΩN2 ,j )(Θ̃N2 − Θ∗N2 )k22 = k ωij (θ̃i − θi∗ )(θ̃i − θi∗ )T k2 ≤ kΘ̃N2 − Θ∗N2 k22→∞ ·nmax
i∈N2
(105)
Combine the above inequality with Lemma 29, we have that with probability at least 1 − 1/p,
kdiag(ΩN2 ,j )(Θ̃N2 − Θ∗N2 )k22 ≤ 2kΘ̃N2 − Θ∗N2 k22→∞ ·nπmax . (106)
On the other hand, with similar argument as those in the proof of Lemma 26, we have that if
πmin σr2 (Θ∗N2 ) ≥ 32kΘ∗N2 k22→∞ log(p) and p ≥ r, then
P min σr2 (diag(ΩN2 ,j )Θ∗N2 ) ≤ 2−1 πmin σr2 (Θ∗N2 ) ≤ 1/(pr) (107)
i∈[n]
Thus, if 2e2Θ,2→∞ nπmax ≤ 2−3 πmin σr2 (Θ∗N2 ), then with probability at least 1 − − 2/p,
where the last inequality in the above display holds because Θ∗N2 = (U∗r )N2 · D∗r and as a result
σr (Θ∗ ) = σr (M∗N2 · ) ≥ ψr .
32
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Proof First, as RG is a submatrix of R, we have σ1 (RG ) ≤ σ1 (R). In the rest of the proof, we
show that (109) holds. Let T = P UD ∈ Rn×r . Then, RG = TG VT and σr2 (RG ) = λr (RG RTG ) =
λr (TG TTG ) = λr (TTG TG ) = λr ( i∈[n] gi ti tTi ) where ti = TTi· indicates the i-th row of the matrix
T.
Note
P that for each i, gi ti tTi is positive semi-definite, and λ1 (gi ti tTi ) ≤ kti k2 ≤ kTk22→∞ . Also,
λr (E( i∈[n] gi ti ti )) = 2−1 λr (TT T) = 2−1 σr2 (R). Applying the weak Chernoff bounds for
T
matrices (inequalities on page 61 of Tropp (2015) under equations (5.1.7) with t = 1/2), we obtain
−3 σ 2 (R)/kTk2
X
P(λr ( gi ti tTi ) ≤ 2−2 σr2 (R)) ≤ re−2 r 2→∞ . (110)
i∈[n]
Lemma 37 (Asymptotic bounds for ψ1 and ψr ). Recall that ψ1 = σ1 (M∗N1 · ) ∨ σ1 (M∗N2 · ) and
ψr = σr (M∗N1 · ) ∧ σr (M∗N2 · ). If σr2 (M∗ )/σ12 (M∗ ) kU∗r k22→∞ log(r), then with probability
converging to 1, σr (M∗ ) . ψr ≤ ψ1 ≤ σ1 (M∗ ).
Proof [Proof of Lemma 37] This lemma is a direct application of Lemma 36 with R, U, and G
replaced by M∗ , U∗r and N1 (or N2 ). We omit the details.
Lemma 38 (Asymptotic analysis for Θ̃N2 ). Let A∗ = Vr∗ P̂, Θ∗ = U∗r D∗r P̂, where P̂ is defined in
(11). Assume that limn,p→∞ P(k − A∗ kF ≤ eA,F ) = 1. Assume the following asymptotic regime
holds:
1. φ . 1;
4.
pπmin
(δ2∗ )−4 (κ∗2 )2 (log(n))2
· max {r1∨(1+2η1 )∨(1−2η2 ) (πmax /πmin ), (κ∗3 )2 (πmax /πmin )3 r5∨(3+2η1 )∨(3+4η1 ) };
5. eA,F (κ∗2 )−1 (δ2∗ )2 min {r−(η1 −η2 ) (πmin /πmax ), (κ∗3 )−1 r−2−η1 (πmin /πmax )2 };
33
C HEN AND L I
Then, with probability converging to 1, there is Θ̃N2 = (θ̃iT )i∈N2 ∈ R|N2 |×r such that S1,i (θ̃i ; Â) =
0 for all i ∈ N2 , and
kΘ̃N2 − Θ∗N2 k2→∞ . κ∗2 (δ2∗ )−1 (πmax /πmin )p1/2 {r(log(n))1/2 (pπmax )−1/2 + r1/2+η1 eA,F }.
(111)
Moreover, Θ̃N2 defined above satisfies kΘ̃N2 − Θ∗N2 k2→∞ ≤ C1 , and θ̃i is the unique solution to
the optimization problem maxθi ∈Rr j∈[p] ωij {yij θiT âj − b(θiT âj )} for all i ∈ N2 .
P
pπmin
(δ2∗ )−4 (κ∗2 )2 (log(n))2 max {r1∨(1+2η1 )∨(1−2η2 ) (πmax /πmin ), (κ∗3 )2 (πmax /πmin )3 r5∨(3+2η1 )∨(3+4η1 ) };
(112)
eA,F (κ∗2 )−1 (δ2∗ )2 min {r−(η1 −η2 ) (πmin /πmax ), (κ∗3 )−1 r−2−η1 (πmin /πmax )2 } (114)
implies
r−1−η1 (κ∗3 )−1 κ∗2 ,
(π /π )1/2 ,
min max
eA,F (115)
(κ2 ) δ2 r−(η1 −η2 ) (πmin /πmax ),
∗ −1 ∗
(κ∗ )−1 (κ∗ )−1 (δ ∗ )2 r−2−η1 (π /π )2 ,
3 2 2 min max
Also, we have
Throughout the proof, we restrict the analysis on the event {k − A∗ kF ≤ eA,F } ∩ {pmax ≤
2pπmax } ∩ {(np)1/2 rη2 . ψr ≤ ψ1 ≤ (np)1/2 rη1 }, which has probability converging to 1 by the
34
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
lemma’s assumption, (113), (116), and Lemma 24. On this event, we have that with probability at
least 1 − 1/n,
maxkZi· diag(Ωi· )Âk≤ 32{φ1/2 (κ∗2 )1/2 C2 log1/2 (n)r1/2 (pπmax )1/2 ∨ r1/2 φC2 /(ρ + 1) log(n)},
i∈N2
(118)
according to Lemma 18. Under the asymptotic regime that φ . 1, C2 . (r/p)1/2 , the above
inequality implies
Note that κ∗2 & 1. According to (113), pπmin r(log n)2 , which implies rp−1/2 log(n)
1/2
(κ∗2 )1/2 r log1/2 (n)πmax . Thus, the above display implies
with probability converging to 1. Next, according to Lemma 19, with probability converging to 1,
we have
maxkB1,i (Â)k
i∈N2
. (121)
≤κ∗2 πmax C1 kÂk2 k − A∗ kF +64 log(n) · (πmax
1/2 ∗
κ2 C1 C2 k − A∗ kF +κ∗2 C1 C22 log(n)).
According to (117), C1 C22 . r3/2+η1 p−1/2 . Also, note that kÂk2 ≤ 1. Thus, the above display
implies that with probability converging to 1,
maxkB1,i (Â)k. κ∗2 {πmax r1/2+η1 p1/2 eA,F + r1+η1 (πmax )1/2 log(n)eA,F + r3/2+η1 p−1/2 log(n)}.
i∈N2
(122)
1/2
According to (113), pπmin r(log n)2 , which implies πmax r1+η1 log(n) πmax r1/2+η1 p1/2 .
Thus, (122) implies that with probability converging to 1,
maxkB1,i (Â)k. κ∗2 (πmax r1/2+η1 p1/2 eA,F + r3/2+η1 p−1/2 log(n)). (123)
i∈N2
According to (113), pπmin r1+2η1 log(n), which implies r3/2+η1 p−1/2 log(n) .
1/2 1/2
r log (n)πmax . This, together with equations (120) and (123), we have
35
C HEN AND L I
Note that C12 C2 . r3/2+2η1 p1/2 . Thus, the above display implies
First, according to (115), eA,F . r−1−η1 (κ∗3 )−1 κ∗2 , which implies κ∗3 r3/2+2η1 p1/2 πmax e2A,F .
κ∗2 πmax r1/2+η1 p1/2 eA,F . Second, according to (113), pπmin (κ∗3 )2 (κ∗2 )−2 r3+2η1 log(n),
1/2
which implies κ∗3 r3/2+2η1 p1/2 · πmax r1/2 p−1/2 (log(n))1/2 eA,F . κ∗2 πmax r1/2+η1 p1/2 eA,F .
Third, according to (113), pπmin (κ∗3 )2 (κ∗2 )−2 r3+4η1 log(n), which implies κ∗3 r3/2+2η1 p1/2 ·
1/2
rp−1 log(n) κ∗2 r log1/2 (n)πmax . Thus, (126) implies that with probability converging to one,
Equations (124) and (127) together imply that with probability converging to 1
Next, we find a lower bound for σr (I1,i (Â)). Note that σr (A∗ ) = 1 and kA∗ k22→∞ . r/p
by assumption. Under the asymptotic regime that pπmin r(log(n))2 , πmin σr2 (A∗ ) ≥
32kA∗ k22→∞ log(n) for n large enough. According to Lemma 26, with probability at least
1 − 1/(nr),
min σr2 (diag(Ωi· )A∗ ) ≥ 2−1 πmin (129)
i∈N2
for n and p large enough. According to Lemma 28, with probability converging to 1,
First, according to (115), eA,F (πmin /πmax )1/2 , which implies πmax e2A,F πmin . Sec-
ond, according to (113) and (115), eA,F (πmin /πmax )1/2 and πmin p r(log(n))2 ,
which implies eA,F (πmin /πmax )1/2 (πmin p)1/2 r−1/2 (log(n))−1 . This further implies
1/2
πmax (r/p)1/2 log(n)eA,F πmin . Third, according to (113), pπmin r(log(n))2 , which im-
plies (r/p) log(n) πmin . Combining the analysis, we have that with probability converging to
one,
maxkdiag(Ωi· )(Â − A∗ )k22 πmin . (131)
i∈N2
Combining the above display with (129) and using Lemma 25, we have that with probability con-
verging to 1,
min σr (I1,i (Â)) ≥ 2−3 δ2∗ πmin . (132)
i∈N2
So far, we have obtained upper bounds for maxi∈N2 {kZi· diag(Ωi· )Âk+kB1,i (Â)k+β1,i (Â)κ∗3 }
and a lower bound for σr (I1,i (Â)). In the rest of the proof, we restrict our analysis on the event that
(128) and (132) hold. To proceed, we verify conditions of of Lemma 16. According to Lemma 24,
36
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
on the event pmax ≤ 2pπmax , maxi∈N2 γ1,i (Â) . pπmax (r/p)3/2 . This and (132) implies with
probability tending to 1
n o
min (γ1,i (Â))−1 (κ3 (3C1 C2 ))−1 σr2 (I1,i (Â))
i∈N2
According to (113), pπmin (κ∗2 )2 (κ∗3 )2 (δ2∗ )−4 (πmax /πmin )3 r5 (log(n)), which im-
1/2
plies κ∗2 r log1/2 (n)πmax (κ∗3 )−1 (δ2∗ )2 p1/2 r−3/2 πmin
2 /π
max . According to (115)
∗ −1 ∗ −1
eA,F (κ3 ) (κ2 ) (δ2 ) r ∗ 2 −2−η 2 ∗π 1/2+η1 p1/2 e
A,F
1 (π /π ) κ
min max , which implies 2 max r
∗ −1 ∗ 2 1/2 −3/2 2 ∗ 1/2 1/2
(κ3 ) (δ2 ) p r πmin /πmax . Combining the analysis, we have κ2 r log (n)πmax +
κ∗2 πmax r1/2+η1 p1/2 eA,F (κ∗3 )−1 (δ2∗ )2 p1/2 r−3/2 πmin 2 /π
max . This, together with (133) implies
max{kZi· diag(Ωi· )Âk+kB1,i (Â)k+β1,i (Â)κ∗3 } min {(γ1,i (Â))−1 (κ3 (3C1 C2 ))−1 σr2 (I1,i (Â))}.
i∈N2 i∈N2
(134)
Next, according to (132) and C1 = {kU∗r k2→∞ ∨(r/n)1/2 } · σ1 (M∗ )
min {σr (I1,i (Â))C1 } & δ2∗ πmin (r/n)1/2 (np)1/2 rη2 & δ2∗ πmin r1/2+η2 p1/2 . (135)
i∈N2
According to (113), pπmin (πmax /πmin )(κ∗2 )2 (δ2∗ )−2 r1−2η2 log(n), which im-
1/2
plies κ∗2 r log1/2 (n)πmax δ2∗ πmin r1/2+η2 p1/2 . According to (115), eA,F
∗ −1 ∗
(κ2 ) δ2 (πmin /πmax )r −(η1 −η2 ) , which implies κ∗2 πmax r1/2+η1 p1/2 eA,F δ2∗ πmin r1/2+η2 p1/2 .
Combining the analysis and (133), we get
max{kZi· diag(Ωi· )Âk+kB1,i (Â)k+β1,i (Â)κ∗3 } min {σr (I1,i (Â))C1 }. (136)
i∈N2 i∈N2
According to (134) and (136), conditions of Lemma 16 are satisfied. According to Lemma 16
and (128) and (132), with probability converging to 1, there exists Θ̃N2 = (θ̃iT )i∈N2 ∈ R|N2 |×r
such that S1,i (θ̃i ; Â) = 0 for all i ∈ N2 , and
and kΘ̃N2 − Θ∗N2 k2→∞ ≤ C1 . Moreover, θ̃i described above is the unique solution to to the
optimization problem maxθi ∈Rr j∈[p] ωij {yij θiT âj − b(θiT âj )} for all i ∈ N2 because this
P
optimization is strictly convex by (132).
Lemma 39 (Asymptotic analysis for Ã). Assume that limn,p→∞ P(kΘ̃N2 − Θ∗N2 k2→∞ ≤
eΘ,2→∞ ) = 1. Assume the the following asymptotic regime holds,
37
C HEN AND L I
1. φ . 1;
4.
nπmin
(κ∗2 )2 (δ2∗ )−4 (log(np))2 (138)
· max {(πmax /πmin )r(1+2η1 −2η2 )∨(1+2η1 −4η2 ) , (κ∗3 )2 (πmax /πmin )3 r5+8η1 −8η2 };
5. eΘ,2→∞ ≤ C1 and
eΘ,2→∞
(δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1
· min{(πmin /πmax )r(−1/2−η1 +2η2 )∧(1/2+2η2 ) , (κ∗3 )−1 (πmin /πmax )2 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) }.
(139)
Then, with probability converging to 1, there is à = (ãTj )j∈[p] ∈ Rp×r such that S2,j (ãj ; Θ̃N2 ) = 0
for all j ∈ [p], kà − A∗ k≤ C2 , and
kà − A∗ k2→∞
n o
.κ∗2 (δ2∗ )−1 (πmax /πmin )r−2η2 log(np)p−1/2 r1+η1 (nπmax )−1/2 + r(1+η1 )∨0 p−1/2 eΘ,2→∞ .
(140)
Moreover, Pãj defined above is the unique solution to the optimization problem
maxaj ∈Rr i∈N2 ωij {yij θiT âj − b(θiT âj )} for all j ∈ [p].
Proof [Proof of Lemma 39] First, the 4-th condition on the asymptotic regime, i.e.,
nπmin
(κ∗2 )2 (δ2∗ )−4 (log(np))2 max {(πmax /πmin )r(1+2η1 −2η2 )∨(1+2η1 −4η2 ) , (κ∗3 )2 (πmax /πmin )3 r5+8η1 −8η2 }
(141)
and n r1+2(η1 −η2 ) log(r), which ensures that the conditions of Lemma 37 holds, and thus,
(np)1/2 rη2 . ψr ≤ ψ2 . (np)1/2 rη2 with probability converging to 1.
38
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
eΘ,2→∞
(δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1
· min{(πmin /πmax )r(−1/2−η1 +2η2 )∧(1/2+2η2 ) , (κ∗3 )−1 (πmin /πmax )2 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) }
(143)
implies
1/2 1/2+η2 . C ,
p r
1
∗ ∗ −1 −1/2 1/2
κ2 (κ3 ) r p log(np),
eΘ,2→∞ (πmin /πmax ) p1/2 rη2 ,
1/2
(κ∗2 )−1 (κ∗3 )−1 (δ2∗ )2 (πmin /πmax )2 p1/2 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) (log(np))−1 ,
∗ −1 ∗
(κ2 ) δ2 (πmin /πmax )r(−1/2−η1 +2η2 )∧(1/2+2η2 ) (log(np))−1 p1/2 ,
(144)
where we used η2 > −1/2 − η1 + 2η2 because η1 − η2 ≥ 0.
Throughout the proof, we restrict the analysis on the event kΘ̃N2 −Θ∗N2 k2→∞ ≤ eΘ,2→∞ ≤ C1 ,
which has probability converging to 1 as n, p → ∞, according to the assumption of the lemma
and (144). This also implies that kΘ̃N2 k≤ 2C1 with probability converging to 1. According to
Lemma 31 and under the asymptotic regime nπmax log(p), with probability converging to 1,
≤16{φ1/2 (κ∗2 )1/2 C1 log1/2 (pr)r1/2 (nπmax )1/2 ∨ r1/2 φC1 /(ρ + 1) log(pr)}
+ 16kΘ̃N2 − Θ∗N2 k2→∞ ·nπmax log(np){(κ∗2 φ)1/2 ∨ 1}
(145)
.(κ∗2 )1/2 p1/2 r1/2+η1 log 1/2
(p)r 1/2
(nπmax ) 1/2
+r 1/2 1/2 1/2+η1
p r log(p)}
+ eΘ,2→∞ nπmax log(np)(κ∗2 )1/2
.(κ∗2 )1/2 r1+η1 p1/2 n1/2 πmax
1/2
log1/2 (p) + eΘ,2→∞ nπmax log(n ∨ p)(κ∗2 )1/2 ,
where we used r1/2 p1/2 r1/2+η1 log(p) . p1/2 r1+η1 log1/2 (p)(nπmax )1/2 under the asymptotic
regime nπmax log(p) for the last inequality.
According to Lemma 32, with probability converging to 1,
maxkB2,j (Θ̃N2 )k≤4C1 C2 κ∗2 nπmax kΘ̃N2 − Θ∗N2 k2→∞ . κ∗2 r1+η1 nπmax eΘ,2→∞ (146)
j∈[p]
max β2,j (Θ̃∗N2 ) ≤ 4C1 C22 kΘ̃N2 − Θ∗N2 k22→∞ nπmax . r3/2+η1 p−1/2 e2Θ,2→∞ nπmax . (147)
j∈[p]
39
C HEN AND L I
Under the asymptotic regime that eΘ,2→∞ . κ∗2 (κ∗3 )−1 r−1/2 p1/2 log(np),
r 3/2+η1p−1/2 2 ∗ ∗ 1+η
eΘ,2→∞ nπmax κ3 . κ2 r 1 log(np)nπmax eΘ,2→∞ . Thus, the above inequal-
ity implies
Next, we derive a lower bound for σr (I2,j (Θ̃N2 )). Under the asymptotic regime nπmin
r1+2η1 −2η2 log(p), and eΘ,2→∞ (πmin /πmax )1/2 p1/2 rη2 , we have nπmax log(p),
πmin (np)r2η2 r1+2η1 p log(p), and e2Θ,2→∞ nπmax πmin (np)r2η2 . Note that σr2 (Θ∗N2 ) ≥
σr2 (M∗N2 ,· ) ≥ ψr2 & (np)r2η2 and kΘ∗N2 k2→∞ . (r/n)1/2 ψ1 . r1/2+η1 p1/2 . Thus, under the same
asymptotic regime, conditions of Lemma 35 hold. Therefore, with probability converging to 1,
σr (I2,j (Θ̃N2 )) ≥ 2−2 δ2∗ πmin ψr2 & δ2∗ πmin (np)r2η2 . (150)
Note that
Under the asymptotic regime nπmin (κ∗2 )2 (κ∗3 )2 (δ2∗ )−4 (πmax /πmin )3 r5+8η1 −8η2 (log(np))2 ,
1/2
we have κ∗2 r1+η1 log(np)p1/2 n1/2 πmax (κ∗3 )−1 (δ2∗ )2 (πmin
2 /π
max )p
1/2 nr −3/2−3η1 +4η2 . Under
the asymptotic regime eΘ,2→∞ (κ∗2 )−1 (κ∗3 )−1 (δ2∗ )2 (πmin /πmax )2 p1/2 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) ·
(log(np))−1 , we have κ∗2 r(1+η1 )∨0 log(np) · nπmax eΘ,2→∞ (κ∗3 )−1 (δ2∗ )2 (πmin 2 /π
max )p
1/2 n ·
1/2
r−3/2−3η1 +4η2 . Combining the analysis, we have κ∗2 r1+η1 p1/2 n1/2 πmax log1/2 (np) + κ∗2 r(1+η1 )∨0 ·
∗ −1 ∗ 2 2
log(np)nπmax eΘ,2→∞ (κ3 ) (δ2 ) (πmin /πmax )p nr 1/2 −3/2−3η 1 +4η2 . This further implies
kZTN2 ,j diag(ΩN2 ,j )Θ̃N2 k+kB2,j (Θ̃N2 )k+β2,j (Θ̃N2 )κ∗3 2−2 (γ2,j (Θ̃N2 ))−1 (κ∗3 )−1 σr2 (I2,j (Θ̃N2 ))
(152)
for all j. According to (150), σr (I2,j (Θ̃N2 ))C2 &
∗ 2η
δ2 πmin (np)r (r/p)
2 1/2 & ∗ 1/2
δ2 πmin np r 1/2+2η2 . According to (142),
nπmin (κ∗2 )2 (δ2∗ )−2 (πmax /πmin )r1+2η1 −4η2 log2 (np), which implies
40
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
1/2
κ∗2 r1+η1 log(np)p1/2 n1/2 πmax δ2∗ πmin np1/2 r1/2+2η2 . According to (144),
∗ −1 ∗
eΘ,2→∞ (κ2 ) δ2 (πmin /πmax )r (−1/2−η 1 +2η 2 )∧(1/2+2η2 ) −1 1/2
(log(np)) p , which implies
κ∗2 r(1+η1 )∨0 log(np) · nπmax eΘ,2→∞ δ2∗ πmin np1/2 r1/2+2η2 . Combine the analysis, we obtain
kZTN2 ,j diag(ΩN2 ,j )Θ̃N2 k+kB2,j (Θ̃N2 )k+β2,j (Θ̃N2 )κ∗3 σr (I2,j (Θ̃N2 ))C2 (153)
for all j.
The inequalities (152) and (153) verify conditions of Lemma 17 (with C1 replaced by 2C1 ).
According to Lemma 17 and combining (149) and (150), with probability converging to 1,
kà − A∗ k2→∞
≤ max σr−1 (I2,j (Θ̃N2 )){kZTN2 ,j diag(ΩN2 ,j )Θ̃N2 k+kB2,j (Θ̃N2 )k+β2,j (Θ̃N2 )κ∗3 }
j∈[p]
n o
−1
.κ∗2 (δ2∗ )−1 πmin (np)−1 r−2η2 r1+η1 p1/2 n1/2 πmax
1/2
log(np) + r(1+η1 )∨0 log(np)nπmax eΘ,2→∞
n o
∗ ∗ −1 −2η2 −1/2 1+η1 −1/2 (1+η1 )∨0 −1/2
.κ2 (δ2 ) (πmax /πmin )r log(np)p r (nπmax ) +r p eΘ,2→∞ .
(154)
Lemma 40 (Asymptotic analysis for M̃N2 · = Θ̃N2 ÃT ). Assume that limn,p→∞ P(kM̂N1 · −
M∗N1 · kF ≤ eM,F ) = 1, and the following asymptotic regime holds:
1. φ . 1;
3. (np)1/2 rη2 . σr (M∗ ) ≤ σ1 (M∗ ) . (np)1/2 rη1 for some constants η1 and η2 ;
4.
pπmin
(κ∗2 )4 (δ2∗ )−6 (log(np))3
(155)
h
· max (πmax /πmin )3 r(1+2η1 )∨(3+2η1 −4η2 )∨(1−4η2 ) ,
i
(κ∗3 )2 (πmax /πmin )5 r(3+2η1 )∨(3+4η1 )∨{7+8(η1 −η2 )}∨(5+6η1 −8η2 ) ;
5.
nπmin
(κ∗2 )2 (δ2∗ )−4 (log(np))2 max {(πmax /πmin )r(1+2η1 −2η2 )∨(1+2η1 −4η2 ) , (156)
(κ∗3 )2 (πmax /πmin )3 r5+8η1 −8η2 };
41
C HEN AND L I
6.
(np)−1/2 eM,F
(κ∗2 )−2 (δ2∗ )3 (log(np))−1 (πmin /πmax )3 min [r(−η1 +η2 )∧(−1−2η1 +3η2 )∧(−η1 +3η2 ) , (157)
(κ∗3 )−1 r(−2−η1 )∨{−3−5(η1 −η2 )}∧(−2−4η1 +5η2 ) ].
Proof [Proof of Lemma 40] First, we analyze the asymptotic regime assumption. The 4-th condition
of the asymptotic regime, i.e.,
pπmin
(κ∗2 )4 (δ2∗ )−6 (log(np))3
(159)
h
· max (πmax /πmin )3 r(1+2η1 )∨(3+2η1 −4η2 )∨(1−4η2 ) ,
i
(κ∗3 )2 (πmax /πmin )5 r(3+2η1 )∨(3+4η1 )∨{7+8(η1 −η2 )}∨(5+6η1 −8η2 )
implies
pπmin
∗ −4 ∗ 2 2 1∨(1+2η1 )∨(1−2η2 ) (π ∗ 2 3 5∨(3+2η1 )∨(3+4η1 ) },
(δ2 ) (κ2 ) (log(n)) max {r
max /πmin ), (κ3 ) (πmax /πmin ) r
(κ∗2 )4 (δ2∗ )−6 (πmax /πmin )3 (log(np))3 r(3+2η1 −4η2 )∨(1−4η2 ) ,
∗ 2 ∗ 4 ∗ −6
(κ3 ) (κ2 ) (δ2 ) (πmax /πmin )5 r{7+8(η1 −η2 )}∨(5+6η1 −8η2 ) (log(np))3 ,
(160)
where we used the fact 1 ≤ (1 + 2η1 ) ∨ (1 − 2η2 ), 3 + 2η1 − 4η2 > 2 − 2η2 , and 7 + 8(η1 − η2 ) > 5.
The 6-th condition of the asymptotic regime, i.e.,
(np)−1/2 eM,F
(κ∗2 )−2 (δ2∗ )3 (log(np))−1 (πmin /πmax )3 min [r(−η1 +η2 )∧(−1−2η1 +3η2 )∧(−η1 +3η2 ) , (161)
(κ∗3 )−1 r(−2−η1 )∨{−3−5(η1 −η2 )}∧(−2−4η1 +5η2 ) ]
implies
η2
r ,
rη2 (κ∗ )−1 (δ ∗ )2 min {r−(η1 −η2 ) (π /π ), (κ∗ )−1 r−2−η1 (π /π )2 },
min max min max
(np)−1/2 eM,F ∗
2
−2 ∗ 3
2
2 −1 (−1−2η
3
+3η )∧(−η +3η )
(κ2 ) (δ2 ) (πmin /πmax ) (log(np)) r 1 2 1 2 ,
(κ∗ )−2 (δ ∗ )3 (π /π )3 (log(np))−1 (κ∗ )−1 r{−3−5(η1 −η2 )}∧(−2−4η1 +5η2 ) ,
2 2 min max 3
(162)
42
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
where we used the fact that η2 ≥ −1 − 2η1 + 3η2 and η2 − (η1 − η2 ) ≥ −1 − 2η1 + 3η2 .
According to (162), eM,F (np)1/2 rη2 . ψr , which implies that the conditions
for Lemma 14 holds. Thus, with probability converging to 1, k − A∗ kF ≤ eA,F ,
where eA,F = 8ψr eM,F . Note that eA,F . r−η2 (np)−1/2 eM,F . According to (162),
−1
eM,F (np)1/2 rη2 (κ∗2 )−1 (δ2∗ )2 min {r−(η1 −η2 ) (πmin /πmax ), (κ∗3 )−1 r−2−η1 (πmin /πmax )2 },
which implies eA,F (κ∗2 )−1 (δ2∗ )2 min {r−(η1 −η2 ) (πmin /πmax ), (κ∗3 )−1 r−2−η1 (πmin /πmax )2 }.
According to (160),
(δ2∗ )−4 (κ∗2 )2 (log(n))2 max {r1∨(1+2η1 )∨(1−2η2 ) (πmax /πmin ), (κ∗3 )2 (πmax /πmin )3 r5∨(3+2η1 )∨(3+4η1 ) }
pπmin . Thus, the asymptotic regime of Lemma 38 is satisfied.
According to Lemma 38, kΘ̂N2 − Θ∗N2 k2→∞ ≤ eΘN2 ,2→∞ , with probability converging to 1,
for eΘN2 ,2→∞ satisfying
eΘN2 ,2→∞
∼κ∗2 (δ2∗ )−1 (πmax /πmin )p1/2 {r(log(n))1/2 (pπmax )−1/2 + r1/2+η1 eA,F } (163)
.κ∗2 (δ2∗ )−1 (πmax /πmin )p1/2 {r(log(n))1/2 (pπmax )−1/2 +r 1/2+η1
·r −η2 −1/2
(np) eM,F }.
Next, we verify that the asymptotic regime of Lemma 39 is satisfied. We first verify conditions
about eΘ,2→∞ . According to (160),
pπmin (κ∗2 )4 (δ2∗ )−6 (πmax /πmin )3 (log(np))3 r(3+2η1 −4η2 )∨(1−4η2 ) ,
which implies
According to (160), pπmin (κ∗3 )2 (κ∗2 )4 (δ2∗ )−6 (πmax /πmin )5 r{7+8(η1 −η2 )}∨(5+6η1 −8η2 ) (log(np))3 ,
which implies
According to (162), (np)−1/2 eM,F (κ∗2 )−2 (δ2∗ )3 (πmin /πmax )2 (log(np))−1 r(−1−2η1 +3η2 )∧(−η1 +3η2 ) ,
which implies
κ∗2 (δ2∗ )−1 (πmax /πmin )p1/2 · r1/2+η1 · r−η2 (np)−1/2 eM,F
(166)
(δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1 (πmin /πmax )r(−1/2−η1 +2η2 )∧(1/2+2η2 ) .
According to (162),
(np)−1/2 eM,F (κ∗2 )−2 (δ2∗ )3 (πmin /πmax )3 (log(np))−1 (κ∗3 )−1 r{−3−5(η1 −η2 )}∧(−2−4η1 +5η2 ) ,
which implies
κ∗2 (δ2∗ )−1 (πmax /πmin )p1/2 · r1/2+η1 · r−η2 (np)−1/2 eM,F
(167)
(δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1 (κ∗3 )−1 (πmin /πmax )2 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) .
43
C HEN AND L I
κ∗2 (δ2∗ )−1 (πmax /πmin )p1/2 {r(log(n))1/2 (pπmax )−1/2 + r1/2+η1 −η2 (np)−1/2 eMN1 ,· ,F }
(δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1
· min{(πmin /πmax )r(−1/2−η1 +2η2 )∧(1/2+2η2 ) , (κ∗3 )−1 (πmin /πmax )2 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) }
(168)
which implies eΘN2 ,2→∞ satisfies the 5-th condition of the asymptotic regime of Lemma 39.
On the other hand, according to the lemma’s assumption,
nπmin
(κ∗2 )2 (δ2∗ )−4 (log(np))2 max {(πmax /πmin )r(1+2η1 −2η2 )∨(1+2η1 −4η2 ) , (κ∗3 )2 (πmax /πmin )3 r5+8η1 −8η2 }.
(169)
Thus, the other requirements for the asymptotic regime in Lemma 39 are also satisfied.
According to Lemma 39, we have kà − A∗ k2→∞ ≤ eA,2→∞ with probability converging to 1,
where
n o
eA,2→∞ ∼ κ∗2 (δ2∗ )−1 (πmax /πmin )r−2η2 log(np) r1+η1 p−1/2 (nπmax )−1/2 + r(1+η1 )∨0 p−1/2 eΘ,2→∞ .
(170)
eA,2→∞
h
.κ∗2 (δ2∗ )−1 (πmax /πmin )r−2η2 log(np)p−1/2 r1+η1 (nπmax )−1/2
+ r(1+η1 )∨0 p−1/2
i
· κ∗2 (δ2∗ )−1 (πmax /πmin )p1/2 {r(log(n))1/2 (pπmax )−1/2 + r1/2+η1 · r−η2 (np)−1/2 eM,F }
h
.(δ2∗ )−2 (κ∗2 )2 (log(np))3/2 (πmax /πmin )2 p−1/2 r(2+η1 −2η2 )∨(1−2η2 ) {(p ∧ n)πmax }−1/2
i
+ r(3/2+2η1 −3η2 )∨(1/2+η1 −3η2 ) (np)−1/2 eM,F .
(171)
Now, we combine the above analysis to find an upper bound for kM̃N2 · −M∗N2 kmax . Recall that
M̃N2 · = Θ̃N2 ÃT . Thus, for P̂ ∈ Or×r defined in (11), and Θ∗N2 = (U∗r )N2 · D∗r P̂, A∗ = Vr∗ P̂,
we have
M̃N2 · − M∗N2 ·
=Θ̃N2 ÃT − (U∗r )N2 · D∗r (Vr∗ )T
=Θ̃N2 ÃT − (U∗r )N2 · D∗r P̂(Vr∗ P̂)T (172)
=Θ̃N2 ÃT − Θ∗N2 (A∗ )T
=(Θ̃N2 − Θ∗N2 )(A∗ )T + Θ̃N2 (Ã − A∗ )T .
44
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
5.
nπmin
(κ∗2 )2 (δ2∗ )−4 (log(np))2 max {(πmax /πmin )r(1+2η1 −2η2 )∨(1+2η1 −4η2 ) , (κ∗3 )2 (πmax /πmin )3 r5+8η1 −8η2 };
(176)
45
C HEN AND L I
6.
(np)−1/2 eM,F
(κ∗2 )−2 (δ2∗ )3 (log(np))−1 (πmin /πmax )3 min [r(−η1 +η2 )∧(−1−2η1 +3η2 )∧(−η1 +3η2 ) ,
(κ∗3 )−1 r(−2−η1 )∨{−3−5(η1 −η2 )}∧(−2−4η1 +5η2 ) ].
(177)
Then, with probability converging to 1, estimating equations in steps 3 and 4 of Algorithm 2 have a
unique solution and
kM̃ − M∗ kmax
h
.(δ2∗ )−2 (κ∗2 )2 (πmax /πmin )2 log3/2 (np) r(5/2+2η1 −2η2 )∨(3/2+η1 −2η2 ) {(p ∧ n)πmax }−1/2
i
+ r(2+3η1 −3η2 )∨(1+2η1 −3η2 ) (np)−1/2 eM,F .
(178)
Proof [Proof of Lemma 41] Recall that M̃ = (m̃ij )i∈[n],j∈[p] , where (m̃ij )i∈N1 ,j∈[p] =
(2) (1)
Θ̃N1 (Ã(2) )T and (m̃ij )i∈N2 ,j∈[p] = Θ̃N2 (Ã(1) )T . The error rate for (m̃ij )i∈N2 ,j∈[p] =
(1) (1) T
Θ̃N2 (Ã ) is obtained by Lemma 40, and the error rate of (m̃ij )i∈N1 ,j∈[p] is obtained by swapping
(1) (2)
(Â(1) , Θ̃N2 , Ã(1) , N1 ) with (Â(2) , Θ̃N1 , Ã(2) , N2 ) in the proof of Lemma 40.
The uniqueness of the solution to estimating equations in steps 3 and 4 of Algorithm 2 is proved
by the uniqueness property in Lemma 38 and 39.
46
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
(np)−1/2 eM,F (κ∗2 )−2 (δ2∗ )3 (log(np))−1 min [r0∧(−1+η)∧(2η) , (κ∗3 )−1 r(−2−η)∧(−3)∧(−2+η) ],
(181)
and is implied by (np)−1/2 eM,F (κ∗2 )−2 (δ2∗ )3 (log(np))−1 min [r−2 , (κ∗3 )−1 r−3 ], and further
implied by the asymptotic requirement R7’.
Thus, under R1-R6 and R7’, the conditions of Lemma 41 is satisfied and with probability con-
verging to 1,
kM̃ − M∗ kmax
h
.(δ2∗ )−2 (κ∗2 )2 log3/2 (np) r(5/2+2η1 −2η2 )∨(3/2+η1 −2η2 ) {(p ∧ n)π}−1/2
i
(2+3η1 −3η2 )∨(1+2η1 −3η2 ) −1/2
+r (np) eM,F
h i (182)
.(δ2∗ )−2 (κ∗2 )2 log3/2 (np) r5/2∨(3/2−η) {(p ∧ n)π}−1/2 + r2∨(1−η) (np)−1/2 eM,F
h i
.(δ2∗ )−2 (κ∗2 )2 log3/2 (np) r5/2 {(p ∧ n)π}−1/2 + r2 (np)−1/2 eM,F
h i
.(δ2∗ )−2 (κ∗2 )2 log2 (np)r5/2 {(p ∧ n)π}−1/2 + (np)−1/2 eM,F .
and A∗ = Vr∗ P̂ and Θ∗ = U∗r D∗r P̂. With similar derivations as those for Lemma 14, we have the
following lemma.
47
C HEN AND L I
The rest of the section is organized as follows. In Section B.1, we obtain non-asymptotic prob-
abilistic bounds for terms involved in the estimating equations in Step 3 and 4 of Algorithm 1. In
Section B.2, we obtain asymptotic error bounds for kΘ̃ − Θ∗ k2→∞ (Lemma 48). In Section B.3,
we provide error bound kM̃ − M∗ kmax (Lemma 49) under a general setting. Finally, the proof of
Theorem 5 is given in Section B.4.
For the first term on the right-hand side of the above inequality, we follow a similar proof as that in
the proof of Lemma 18 (with  replaced by A∗ ) and obtain that with probability at least 1 − (nr)−1
max{kZi· diag(Ωi· )A∗ k} ≤ 8{φ1/2 (κ2 (2ρ+1))1/2 C2 log1/2 (nr)r1/2 p1/2
max ∨r
1/2
φC2 /(ρ+1) log(nr)}.
i∈[n]
(189)
For the second term on the right-hand side of equation (188), we apply Lemma 30 and obtain that
with probability at least 1 − (np)−1 ,
∗ ∗ 1/2 ∗
kZkmax p1/2
max k − A kF ≤ 8 log(np){(φκ2 ) ∨ 1} · p1/2
max k − A kF . (190)
The proof is completed by combining the above two inequalities.
48
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Lemma 44 (Upper bound for kB1,i (Â)k without data splitting). Let A∗ = Vr∗ P̂ and Θ∗ =
U∗r D∗r P̂. Assume kÂk2→∞ , kVr∗ k2→∞ ≤ C2 and kU∗r D∗r k2→∞ ≤ C1 , and  may be dependent
with Ωi· Then,
kB1,i (Â)k≤ C1 C2 κ∗2 pmax
1/2
k − A∗ kF . (191)
Proof [Proof of Lemma 44] First, by the assumptions and P̂ is orthogonal, kΘ∗ k2→∞ =
kU∗r D∗r k2→∞ ≤ C1 and kA∗ k2→∞ = kVr∗ k2→∞ ≤ C2 . Recall that
p
X
kB1,i (Â)k=k ωij b00 (m∗ij )âj (âj − a∗j )T θi∗ k
j=1
p
X
≤C1 C2 ωij b00 (m∗ij )kâj − a∗j k (192)
j=1
p
X
≤C1 C2 κ∗2 ωij kâj − a∗j k.
j=1
Lemma 45 (Bound for β1,i (Â), without data splitting). If kU∗r D∗r k2→∞ ≤ C1 ,
kÂk2→∞ , kVr∗ k2→∞ ≤ C2 , then,
Lemma 46 (Bound for γ1,i (Â), without data splitting). If kÂk2→∞ ≤ C2 , then with probability at
least 1 − 1/n,
max γ1,i (Â) ≤ 2pπmax C23 . (196)
i∈[n]
Proof [Proof of Lemma 46] The proof of this Lemma is the same as that of Lemma 24 which does
not require the independence between  and Ωi· .
Lemma 47.
maxkdiag(Ωi· )( − A∗ )k22 ≤ k − A∗ k2F . (197)
i∈[n]
49
C HEN AND L I
maxkdiag(Ωi· )( − A∗ )k22 ≤ maxkdiag(Ωi· )k22 k − A∗ k2F = k − A∗ k2F . (198)
i∈[n] i∈[n]
1. φ ∼ 1, πmin ∼ πmax ∼ π;
3. (np)1/2 rη2 . σr (M∗ ) ≤ σ1 (M∗ ) . (np)1/2 rη1 , and η1 and η2 are constants;
5. eA,F (κ∗2 )−1 (δ2∗ )2 (log(np))−1 min{r0∧(−1/2−η1 +η2 )∧(1/2+η2 ) , (κ∗3 )−1 r(−5/2−η1 )∧(−3/2) }π 1/2 .
Then, with probability converging to 1, there is Θ̃ = (θ̃iT )i∈[n] such that S1,i (θ̃i , Â) = 0, for all
i ∈ [n],kΘ̃ − Θ∗ k2→∞ ≤ C1 , and
kΘ̃ − Θ∗ k2→∞ . κ∗2 (δ2∗ )−1 π −1/2 {r(log(n))1/2 + log(np)r(1+η1 )∨0 p1/2 eA,F }. (199)
T
P
Moreover, θ̃i is the unique solution to the optimization problem maxθi ∈Rr j∈[p] ωij {yij θi âj −
b(θiT âj )} for all i ∈ [n].
Proof [Proof of Lemma 48] First, we provide analysis on the asymptotic regime. Note that κ∗2 ≥
κ2 (0) & 1 and δ2∗ ≤ δ2 (0) . 1. Then, the 4-th condition on the asymptotic regime, i.e.,
eA,F (κ∗2 )−1 (δ2∗ )2 (log(np))−1 min{r0∧(−1/2−η1 +η2 )∧(1/2+η2 ) , (κ∗3 )−1 r(−5/2−η1 )∧(−3/2) }π 1/2
(202)
50
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
implies
(κ∗3 )−1 r−1/2−η1 π 1/2 ,
π 1/2 ,
eA,F (203)
(κ∗2 )−1 (κ∗3 )−1 (δ2∗ )2 (log(np))−1 r(−5/2−η1 )∧(−3/2) π 1/2 ,
δ ∗ (κ∗ )−1 (log(np))−1 r(−1/2−η1 +η2 )∧(1/2+η2 ) π 1/2 ,
2 2
≤16{φ1/2 (κ2 (2ρ + 1))1/2 C2 log1/2 (nr)r1/2 (pπmax )1/2 ∨ r1/2 φC2 /(ρ + 1) log(nr)} (204)
according to Lemma 43. Under the asymptotic regime that φ . 1, πmin ∼ πmax ∼ π, C2 .
(r/p)1/2 , the above inequality implies
maxkZi· diag(Ωi· )Âk. (κ∗2 )1/2 r log1/2 (n)π 1/2 + rp−1/2 log(n) + (κ∗2 )1/2 log(np)p1/2 π 1/2 eA,F .
i∈N2
(205)
According to (201), pπ r(log n)2 , which implies rp−1/2 log(n) (κ∗2 )1/2 r log1/2 (n)π 1/2 .
Thus, the above display implies
maxkZi· diag(Ωi· )Âk. (κ∗2 )1/2 r log1/2 (n)π 1/2 + (κ∗2 )1/2 log(np)p1/2 π 1/2 eA,F (206)
i∈N2
Note that C1 C2 . r1+η1 . Thus, the above display implies that with probability converging to one,
max{kZi· diag(Ωi· )Âk+kB1,i (Â)k} . κ∗2 {r log1/2 (n)π 1/2 + log(np)r(1+η1 )∨0 p1/2 π 1/2 eA,F }.
i∈[n]
(209)
51
C HEN AND L I
Note that C12 C2 . r3/2+2η1 p1/2 . Thus, the above display implies
κ∗3 r3/2+2η1 p1/2 e2A,F . κ∗2 log(np)r(1+η1 )∨0 p1/2 π 1/2 eA,F .
Next, we find a lower bound for σr (I1,i (Â)). With similar derivations as those for (129), we have
According to (203), eA,F π 1/2 . Thus, the above two inequalities and Lemma 25 together imply
that with probability converging to 1,
Next, we verify conditions of Lemma 16. According to Lemma 46, on the event pmax ≤ 2pπmax ,
maxi∈[n] γ1,i (Â) . (pπ(r/p)3/2 ). Following similar arguments as those for (133), we have with
probability tending to 1,
Under the asymptotic regime pπ (κ∗2 )2 (κ∗3 )2 (δ2∗ )−4 r5 log(n), we have
∗ 1/2
κ2 π r(log(n)) 1/2 ∗ −1 ∗
(κ3 ) (δ2 ) p r 2 1/2 −3/2 π. Under the asymptotic
regime eA,F (κ∗2 )−1 (κ∗3 )−1 (δ2∗ )2 (log(np))−1 r(−5/2−η1 )∧(−3/2) π 1/2 , we have
∗
κ2 log(np)r (1+η 1 )∨0 p π eA,F (κ∗3 )−1 (δ2∗ )2 p1/2 r−3/2 π. Combining the analysis, we
1/2 1/2
have κ∗2 {log1/2 (n)rπ 1/2 + log(np)r(1+η1 )∨0 p1/2 π 1/2 eA,F } (κ∗3 )−1 (δ2∗ )2 p1/2 r−3/2 π. This,
together with (216) implies with probability tending to 1,
max{kZi· diag(Ωi· )Âk+kB1,i (Â)k+β1,i (Â)κ∗3 } min{(γ1,i (Â))−1 (κ3 (3C1 C2 ))−1 σr2 (I1,i (Â))}.
i∈[n] i∈[n]
(217)
52
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
According to (201), pπ (κ∗2 )2 (δ2∗ )−2 log(n)r1−2η2 , which implies κ∗2 log1/2 (n)rπ 1/2
δ2∗ πr1/2+η2 p1/2 . According to (203), eA,F δ2∗ (κ∗2 )−1 (log(np))−1 r(−1/2−η1 +η2 )∧(1/2+η2 ) π 1/2 ,
which implies κ∗2 log(np)r(1+η1 )∨0 p1/2 π 1/2 eA,F δ2∗ πr1/2+η2 p1/2 . Combining the analysis with
(201) and (212), we obtain with probability tending to 1,
max{kZi· diag(Ωi· )Âk+kB1,i (Â)k+β1,i (Â)κ∗3 } min σr (I1,i (Â))C2 . (219)
i∈[n] i∈[n]
Thus, conditions of Lemma 16 are satisfied. According to Lemma 16 with A replaced by  and
according to (212) and (215), we have kΘ̃ − Θ∗ k2→∞ ≤ C1 and
kΘ̃ − Θ∗ k2→∞
h i
≤ max (σr (I1,i (Â)))−1 {kZi· diag(Ωi· )Âk+kB1,i (Â)k+β1,i (Â)κ∗3 }
i∈[n] (220)
.(δ2∗ π)−1 κ∗2 {r log1/2 (n)π 1/2 + log(np)r(1+η1 )∨0 p1/2 π 1/2 eA,F }
=κ∗2 (δ2∗ )−1 π −1/2 {r(log(n))1/2 + log(np)r(1+η1 )∨0 p1/2 eA,F }
with probability converging to 1. Moreover, from (215) the optimization problem
T T
P
maxθi ∈R r
j∈[p] ωij {yij θi âj − b(θi âj )} is strictly convex. Thus, θ̃i is the unique solution to
this optimization problem.
5. nπ (κ∗2 )2 (δ2∗ )−4 (log(np))2 max {r(1+2η1 −2η2 )∨(1+2η1 −4η2 ) , (κ∗3 )2 r5+8η1 −8η2 };
6.
(np)−1/2 eM,F
(κ∗2 )−2 (δ2∗ )3 (log(np))−2 π 1/2 ·
min [r(1/2+2η2 )∧(−3/2−2η1 +3η2 )∧(−1/2−η1 +3η2 )∧(1/2+3η2 ) ,
(κ∗3 )−1 r(−7/2−5η1 +5η2 )∧(−5/2−4η1 +5η2 )∧(−3/2−3η1 +5η2 ) ].
(221)
53
C HEN AND L I
Then, with probability converging to 1, estimating equations in steps 3 and 4 of Algorithm 1 have a
unique solution and
kM̃ − M∗ kmax
h
.(δ2∗ )−2 (κ∗2 )2 (log(np))2 r(5/2+2η1 −2η2 )∨(3/2+η1 −2η2 ) {(n ∧ p)π}−1/2 (222)
i
+ r(5/2+3η1 −3η2 )∨(3/2+2η1 −3η2 )∨(1/2+η1 −3η2 ) (npπ)−1/2 eM,F .
Proof First, we analyze the asymptotic regime assumption. The 4-th condition of the asymptotic
regime, i.e.,
h i
pπ (κ∗2 )4 (δ2∗ )−6 (log(np))3 · max r(1+2η1 )∨(3+2η1 −4η2 )∨(1−4η2 ) , (κ∗3 )2 r{7+8(η1 −η2 )}∨(5+6η1 −8η2 )
(223)
implies
∗ −4 ∗ 2 2 1∨(1−2η2 ) , (κ∗ )2 r 5 },
(δ2 ) (κ2 ) log (n) max {r
3
pπ (δ2∗ )−6 (κ∗2 )4 (log(np))3 r(3+2η1 −4η2 )∨(1−4η2 ) , (224)
∗ −6 ∗ 4 ∗ 2
(δ2 ) (κ2 ) (κ3 ) (log(np))3 r{7+8(η1 −η2 )}∨(5+6η1 −8η2 ) ,
where we used the fact that 7 + 8(η1 − η2 ) ≥ 7 > 5, (1 + 2η1 ) ∨ (2 − 2η2 ) ≥ 1, and 2 − 2η2 <
3 + 2η1 − 4η2 .
The 6-th condition of the asymptotic regime, i.e.,
(np)−1/2 eM,F
(κ∗2 )−2 (δ2∗ )3 (log(np))−2 π 1/2 ·
(225)
min [r(1/2+2η2 )∧(−3/2−2η1 +3η2 )∧(−1/2−η1 +3η2 )∧(1/2+3η2 ) ,
(κ∗3 )−1 r(−7/2−5η1 +5η2 )∧(−5/2−4η1 +5η2 )∧(−3/2−3η1 +5η2 ) ]
implies
rη2 (κ∗2 )−1 (δ2∗ )2 (log(np))−1 r0∧(−1/2−η1 +η2 )∧(1/2+η2 ) π 1/2 ,
rη2 (κ∗ )−1 (δ ∗ )2 (log(np))−1 (κ∗ )−1 r(−5/2−η1 )∧(−3/2) π 1/2 ,
(np)−1/2 eM,F 2 2 3
∗ )3 (κ∗ )−2 (log(np))−2 r (−3/2−2η1 +3η2 )∧(−1/2−η1 +3η2 )∧(1/2+3η2 ) π 1/2 ,
(δ 2 2
(δ ∗ )3 (κ∗ )−2 (κ∗ )−1 (log(np))−2 r(−7/2−5η1 +5η2 )∧(−5/2−4η1 +5η2 )∧(−3/2−3η1 +5η2 ) π 1/2 ,
2 2 3
(226)
where we used the fact that η2 ≥ −1/2 − η1 + 2η2 , −1/2 − η1 + 2η2 > −3/2 − 2η1 + 3η2 ,
−5/2 − η1 + η2 > −7/2 − 5η1 + 5η2 , and −3/2 + η2 > −5/2 − 4η1 + 5η2 .
According to (226),
eM,F
(np)1/2 rη2 (κ∗2 )−1 (δ2∗ )2 (log(np))−1 min{r0∧(−1/2−η1 +η2 )∧(1/2+η2 ) , (κ∗3 )−1 r(−5/2−η1 )∧(−3/2) }π 1/2 ,
(227)
54
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
eA,F
(κ∗2 )−1 (δ2∗ )2 (log(np))−1 min{r0∧(−1/2−η1 +η2 )∧(1/2+η2 ) , (κ∗3 )−1 r(−5/2−η1 )∧(−3/2) }π 1/2 .
(228)
Also, according to the lemma’s assumption, pπ (δ2∗ )−4 (κ∗2 )2 log2 (n) max {r1∨(1−2η2 ) , (κ∗3 )2 r5 }.
Thus, the conditions of Lemma 48 are satisfied. According to Lemma 48, kΘ̂ − Θ∗ k2→∞ ≤
eΘ,2→∞ , with probability converging to 1, for eΘ,2→∞ satisfying
eΘ,2→∞
∼κ∗2 (δ2∗ )−1 π −1/2 {r(log(n))1/2 + log(np)r(1+η1 )∨0 p1/2 eA,F }
(229)
.κ∗2 (δ2∗ )−1 π −1/2 {r(log(n))1/2 + log(np)r(1+η1 )∨0 p1/2 · r−η2 (np)−1/2 eM,F }
∼κ∗2 (δ2∗ )−1 π −1/2 {r(log(n))1/2 + log(np)r(1+η1 −η2 )∨(−η2 ) n−1/2 eM,F }.
Note that the proof of Lemma 39 does not require the independence between Θ̃N2 and the missing
pattern Ω. Thus, following similar arguments, Lemma 39 still applies with Θ̃N2 replaced with Θ̃
and N2 replaced with [n]. Next, we verify that the asymptotic regime of Lemma 39 is satisfied.
According to (224), pπ (δ2∗ )−6 (κ∗2 )4 (log(np))3 r(3+2η1 −4η2 )∨(1−4η2 ) , which implies
κ∗2 (δ2∗ )−1 π −1/2 r(log(n))1/2 (δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1 r(−1/2−η1 +2η2 )∧(1/2+2η2 ) . (230)
According to (224), pπ (δ2∗ )−6 (κ∗2 )4 (κ∗3 )2 (log(np))3 r{7+8(η1 −η2 )}∨(5+6η1 −8η2 ) , which implies
κ∗2 (δ2∗ )−1 π −1/2 r(log(n))1/2 (δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1 (κ∗3 )−1 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) .
(231)
According to (226),
(np)−1/2 eM,F (δ2∗ )3 (κ∗2 )−2 (log(np))−2 r(−3/2−2η1 +3η2 )∧(−1/2−η1 +3η2 )∧(1/2+3η2 ) π 1/2 , which
implies
According to (226),
(np)−1/2 eM,F (δ2∗ )3 (κ∗2 )−2 (κ∗3 )−1 (log(np))−2 r(−7/2−5η1 +5η2 )∧(−5/2−4η1 +5η2 )∧(−3/2−3η1 +5η2 ) π 1/2 ,
which implies
eΘ,2→∞
κ∗2 (δ2∗ )−1 π −1/2 (δ2∗ )2 (κ∗2 )−1 p1/2 (log(np))−1 · min{r(−1/2−η1 +2η2 )∧(1/2+2η2 ) , (234)
(κ∗3 )−1 r(−5/2−4η1 +4η2 )∧(−3/2−3η1 +4η2 ) },
55
C HEN AND L I
which implies eΘ,2→∞ satisfies the 5-th condition of the asymptotic regime of Lemma 39.
On the other hand, according to the lemma’s assumption,
nπmin
(κ∗2 )2 (δ2∗ )−4 (log(np))2 max {(πmax /πmin )r(1+2η1 −2η2 )∨(1+2η1 −4η2 ) , (κ∗3 )2 (πmax /πmin )3 r5+8η1 −8η2 }.
(235)
Thus, the other requirements for the asymptotic regime in Lemma 39 are also satisfied.
According to Lemma 39, we have kà − A∗ k2→∞ ≤ eA,2→∞ with probability converging to 1,
where
n o
eA,2→∞ ∼ κ∗2 (δ2∗ )−1 r−2η2 log(np)p−1/2 r1+η1 (nπ)−1/2 + r(1+η1 )∨0 p−1/2 eΘ,2→∞ . (236)
eA,2→∞
h
.κ∗2 (δ2∗ )−1 r−2η2 log(np)p−1/2 r1+η1 (nπ)−1/2
i
+ r(1+η1 )∨0 p−1/2 · κ∗2 (δ2∗ )−1 π −1/2 {r(log(n))1/2 + log(np)r(1+η1 −η2 )∨(−η2 ) n−1/2 eM,F }
h
.(δ2∗ )−2 (κ∗2 )2 (log(np))2 p−1/2 r(2+η1 −2η2 )∨(1−2η2 ) {(n ∧ p)π}−1/2
i
+ r(2+2η1 −3η2 )∨(1+η1 −3η2 )∨(−3η2 ) (npπ)−1/2 eM,F .
(237)
Next, we derive an asymptotic upper bound for kM̃ − M∗ kmax . Recall that M̃ = Θ̃ÃT .
Thus, for P̂ ∈ Or×r defined in (183) and Θ∗ = (U∗r )D∗r P̂, A∗ = Vr∗ P̂, we have M̃ − M∗ =
Θ̃ÃT − Θ∗ (A∗ )T = (Θ̃ − Θ∗ )(A∗ )T + Θ̃(Ã − A∗ )T . Thus,
kM̃ − M∗ kmax ≤ kΘ̃ − Θ∗ k2→∞ kA∗ k2→∞ +kà − A∗ k2→∞ kΘ̃k2→∞ . (238)
According to Lemma 48 and the assumption kA∗ k2→∞ ≤ C2 . (r/p)1/2 , with probability con-
verging to 1, the above display is further bounded by
56
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Combining the above inequality with (229) and (237), we obtain with probability tending to 1
kM̃ − M∗ kmax
.r1/2 p−1/2 · κ∗2 (δ2∗ )−1 π −1/2 {r(log(n))1/2 + log(np)r(1+η1 −η2 )∨(−η2 ) n−1/2 eM,F }
h
+ r1/2+η1 p1/2 · (δ2∗ )−2 (κ∗2 )2 (log(np))2 p−1/2 r(2+η1 −2η2 )∨(1−2η2 ) {(n ∧ p)π}−1/2
i
+ r(2+2η1 −3η2 )∨(1+η1 −3η2 )∨(−3η2 ) (npπ)−1/2 eM,F
h
.(δ2∗ )−2 (κ∗2 )2 (log(np))2 r(5/2+2η1 −2η2 )∨(3/2+η1 −2η2 ) {(n ∧ p)π}−1/2
i
+ r(3/2+η1 −η2 )∨(1/2−η2 )∨(5/2+3η1 −3η2 )∨(3/2+2η1 −3η2 )∨(1/2+η1 −3η2 ) (npπ)−1/2 eM,F
h
.(δ2∗ )−2 (κ∗2 )2 (log(np))2 r(5/2+2η1 −2η2 )∨(3/2+η1 −2η2 ) {(n ∧ p)π}−1/2
i
+ r(5/2+3η1 −3η2 )∨(3/2+2η1 −3η2 )∨(1/2+η1 −3η2 ) (npπ)−1/2 eM,F ,
(240)
where we used the fact that 3/2 + η1 − η2 < 5/2 + 3η1 − 3η2 and 1/2 − η2 < 3/2 + 2η1 − 3η2 in
the last inequality. This completes the proof.
and is implied by R7: (np)−1/2 eM,F (κ∗2 )−2 (δ2∗ )3 (log(np))−2 π 1/2 min [r−5/2 , (κ∗3 )−1 r−7/2 ]
for η ≥ −1.
57
C HEN AND L I
Thus, under R1-R7, the conditions of Lemma 49 are satisfied, and thus with probability con-
verging to 1,
kM̃ − M∗ kmax
.(δ2∗ )−2 (κ∗2 )2 (log(np))2
h
· r(5/2+2η1 −2η2 )∨(3/2+η1 −2η2 ) {(n ∧ p)π}−1/2
i
+ r(5/2+3η1 −3η2 )∨(3/2+2η1 −3η2 )∨(1/2+η1 −3η2 ) (npπ)−1/2 eM,F
h i
.(δ2∗ )−2 (κ∗2 )2 (log(np))2 r5/2∨(3/2−η) {(n ∧ p)π}−1/2 + r(5/2)∨(3/2−η)∨(1/2−2η2 ) (npπ)−1/2 eM,F
h i
.(δ2∗ )−2 (κ∗2 )2 (log(np))2 r5/2 {(n ∧ p)π}−1/2 + r5/2 (npπ)−1/2 eM,F .
(244)
The above analysis gives the error bound of M̃. The proof for the ‘in particular’ part of the
theorem is similar to that of the proof of Theorem 10, and we skip the repetitive details.
Proof [Proof of Corollary 8] For binomial model b00 (x) = kex (1 + ex )−2 and b(3) (x) = kex (1 +
ex )−2 {1−2(1+e−x )−1 }. Thus, κ2 (α) ≤ k, κ3 (α) ≤ k, and δ2 (α) ≥ keα (1+eα )−2 & ke−α . This
implies that κ∗2 , κ∗3 . 1 under the asymptotic regime that k ∼ 1 (R8B). Also, δ2∗ & ke−2(ρ+1) &
1−
e−2ρ & ke−2 log(n∧p) 0 (n ∧ p)−1 for any constant 1 > 0, where the third inequality is due to
R9B. Combining the analysis above, we have (κ∗2 )4 (δ2∗ )−6 log(np)3 (n ∨ p)61 log(np)3 (n ∨
p)71 . Similarly, (κ∗2 )2 (δ2∗ )−4 (log(np))2 (n∨p)51 , and (κ∗2 )−2 (δ2∗ )3 (log(np))−2 (n∧p)−41 .
Combine the above analysis with R5B – R7B, and note that (1 + 2η) ∨ 5 ≤ (3 + 4η) ∨ 7 for η ≥ −1,
we verify that R5 – R7 hold with 71 < 0 .
For normal model, b00 (x) = 1 and b(3) (x) = 0 for all x. Thus, κ∗2 = δ2∗ = 1 and κ∗3 = 0. Part 2
of Corollary 8 then follows by simplifying Theorem 5.
In the rest of the analysis, we focus on the Poisson model. Note that kM∗ k≤ C1 C2 so we
could choose ρ ≤ C1 C2 . r1+η . Under R10P, r1+η . (log(n ∧ p))1−0 , so max(ρ, C1 C2 ) .
(log(n ∧ p))1−0 .
For Poisson model, b(x) = ex so b00 (x) = b(3) (x) = ex . Thus, κ2 (α), κ3 (α) ≤ eα and
1−
δ2 (α) ≥ e−α . This implies κ∗2 ≤ e2ρ+1 . e2ρ . e2(log(n∧p)) 0 . (n ∧ p)1 for any constant
1 > 0. Similarly, δ2∗ & e−2ρ & (n ∧ p)−1 and κ∗3 . e6C1 C2 . (n ∨ p)1 for any constant 1 > 0.
The proof then follows similarly as that for the normal model.
Proof [Proof of Corollary 12] The proof of Corollary 12 is similar to that of Corollary 8, except
that R7B is replaced by R7’B to ensure R7’ holds. We omit the repetitive details.
58
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Table 5: Simulation settings. ‘Variable type = O’ indicates all the variables are ordinal (with kj = 5),
and ‘Variable type = O + C’ indicates half of the variables are ordinal (with kj = 5) and half are
continuous. For continuous and ordinal variables, we assume the Normal and Binomial models,
respectively.
is given by M∗ = Θ∗ (A∗ )T . The missing indicators ωij s are generated independently from a
Bernoulli distribution with parameter π, where π = 0.6 and 0.2 are considered in the simulation
settings. When ωij = 1 and for an ordinal variable j, Yij is generated from a Binomial distribution
with kj = 5 trials and success probability exp(m∗ij )/(1 + exp(m∗ij )). When ωij = 1 and for an
continuous variable j, Yij is generated from a normal distribution N (m∗ij , 1). In the implementation,
√
we set C2 = 2 r/p in Algorithms 1, 2, and 3. We set ρ0 = r in the NBE and C = r in the
p
CJMLE.
59
C HEN AND L I
0.5
0.5
0.5
Scaled Frobenius norm
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
3.0
3.0
2.5
2.5
2.5
2.0
2.0
2.0
Max norm
Max norm
Max norm
1.5
1.5
1.5
1.0
1.0
1.0
0.5
0.5
0.5
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 3: Results from Simulation Settings 7-9. The plots can be interpreted similarly as those in
Figure 1.
0.7
0.7
0.6
0.6
0.6
Scaled Frobenius norm
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
5
4
4
Max norm
Max norm
Max norm
3
3
2
2
1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 4: Results from Simulation Settings 10-12. The plots can be interpreted similarly as those in
Figure 1.
60
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
0.8
0.8
0.8
Scaled Frobenius norm
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
5
4
4
Max norm
Max norm
Max norm
3
3
2
2
1
1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 5: Results from Simulation Settings 13-15. The plots can be interpreted similarly as those in
Figure 1.
1.0
1.0
0.8
0.8
0.8
Scaled Frobenius norm
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8
6
6
Max norm
Max norm
Max norm
4
4
2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 6: Results from Simulation Settings 16-18. The plots can be interpreted similarly as those in
Figure 1.
61
C HEN AND L I
0.8
0.8
0.8
Scaled Frobenius norm
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
5
4
4
Max norm
Max norm
Max norm
3
3
2
2
1
1
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 7: Results from Simulation Settings 19-21. The plots can be interpreted similarly as those in
Figure 1.
1.0
1.0
0.8
0.8
0.8
Scaled Frobenius norm
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8
6
6
Max norm
Max norm
Max norm
4
4
2
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 8: Results from Simulation Settings 22-24. The plots can be interpreted similarly as those in
Figure 1.
62
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
References
Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, and Yiqiao Zhong. Entrywise eigenvector analysis
of random matrices with low expected rank. Annals of statistics, 48(3):1452, 2020.
Mokhtar Z Alaya and Olga Klopp. Collective matrix completion. Journal of Machine Learning
Research, 20:1–43, 2019.
David J Bartholomew, Fiona Steele, Irini Moustaki, and Jane I Galbraith. Analysis of multivariate
social science data. CRC Press, Boca Raton, FL, 2008.
Yoav Bergner, Peter Halpin, and Jill-Jênn Vie. Multidimensional item response theory in the style
of collaborative filtering. Psychometrika, 87(1):266–288, 2022.
Sonia A Bhaskar. Probabilistic low-rank matrix completion from quantized measurements. The
Journal of Machine Learning Research, 17(1):2131–2164, 2016.
Pratik Biswas, Tzu-Chen. Lian, Ta-Chung. Wang, and Yinyu Ye. Semidefinite programming based
algorithms for sensor network localization. ACM Transactions on Sensor Networks (TOSN), 2
(2):188–220, 2006.
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymp-
totic theory of independence. Oxford University Press, Oxford, England, 2013.
Tony Cai and Wen-Xin Zhou. A max-norm constrained minimization approach to 1-bit matrix
completion. Journal of Machine Learning Research, 14(1):3619–3647, 2013.
Tony Cai and Wen-Xin Zhou. Matrix completion via max-norm constrained optimization. Elec-
tronic Journal of Statistics, 10(1):1493–1525, 2016.
Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foun-
dations of Computational Mathematics, 9(6):717–772, 2009.
Emmanuel J Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix com-
pletion. IEEE Transactions on Information Theory, 56(5):2053–2080, 2010.
Yang Cao and Yao Xie. Poisson matrix recovery and completion. IEEE Transactions on Signal
Processing, 64(6):1609–1620, 2015.
Joshua Cape, Minh Tang, and Carey E Priebe. The two-to-infinity norm and singular subspace
geometry with applications to high-dimensional statistics. Annals of Statistics, 47(5):2405–2439,
2019.
Sourav Chatterjee. Matrix estimation by universal singular value thresholding. The Annals of
Statistics, 43(1):177–214, 2015.
Yunxiao Chen and Xiaoou Li. Determining the number of factors in high-dimensional generalized
latent factor models. Biometrika, 109(3):769–782, 2022.
Yunxiao Chen, Xiaoou Li, and Siliang Zhang. Joint maximum likelihood estimation for high-
dimensional exploratory item factor analysis. Psychometrika, 84(1):124–146, 2019a.
63
C HEN AND L I
Yunxiao Chen, Xiaoou Li, and Siliang Zhang. Structured latent factor analysis for large-scale data:
Identifiability, estimability, and their implications. Journal of the American Statistical Associa-
tion, 115(532):1756–1770, 2020a.
Yunxiao Chen, Xiaoou Li, and Siliang Zhang. Structured latent factor analysis for large-scale data:
Identifiability, estimability, and their implications. Journal of the American Statistical Associa-
tion, 115:1756–1770, 2020b.
Yunxiao Chen, Chengcheng Li, Jing Ouyang, and Gongjun Xu. Statistical inference for noisy
incomplete binary matrix. Journal of Machine Learning Research, 24(95):1–66, 2023.
Yuxin Chen, Jianqing Fan, Cong Ma, and Yuling Yan. Inference and uncertainty quantification
for noisy matrix completion. Proceedings of the National Academy of Sciences, 116(46):22931–
22937, 2019b.
Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, and Yuling Yan. Noisy matrix completion: Under-
standing statistical guarantees for convex relaxation via nonconvex optimization. SIAM Journal
on Optimization, 30(4):3098–3121, 2020c.
Victor Chernozhukov, Christian Hansen, Yuan Liao, and Yinchu Zhu. Inference for low-rank mod-
els. The Annals of statistics, 51(3):1309–1330, 2023.
Mark A Davenport, Yaniv Plan, Ewout Van Den Berg, and Mary Wootters. 1-bit matrix completion.
Information and Inference: A Journal of the IMA, 3:189–223, 2014.
Andrey Feuerverger, Yu He, and Shashi Khatri. Statistical significance of the Netflix challenge.
Statistical Science, 27:202–231, 2012.
David Goldberg, David Nichols, Brian M Oki, and Douglas Terry. Using collaborative filtering to
weave an information tapestry. Communications of the ACM, 35(12):61–70, 1992.
Shelby J Haberman. When can subscores have value? Journal of Educational and Behavioral
Statistics, 33(2):204–229, 2008.
Ruijian Han, Rougang Ye, Chunxi Tan, and Kani Chen. Asymptotic theory of sparse Bradley–Terry
model. Annals of Applied Probability, 30:2491–2515, 2020.
Ruijian Han, Yiming Xu, and Kani Chen. A general pairwise comparison model for extremely
sparse networks. Journal of the American Statistical Association, 118(544):2422–2432, 2023.
F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. ACM
Transactions on Interactive Intelligent Systems (TIIS), 5(4):1–19, 2015.
Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alter-
nating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of
computing, pages 665–674, 2013.
Anura P Jayasumana, Randy Paffenroth, Gunjan Mahindre, Sridhar Ramasamy, and Kelum Ga-
jamannage. Network topology mapping from partial virtual coordinates and graph geodesics.
IEEE/ACM Transactions on Networking, 27(6):2405–2417, 2019.
64
E NTRYWISE C ONSISTENCY FOR M IXED - DATA M ATRIX C OMPLETION
Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from noisy
entries. Journal of Machine Learning Research, 11:2057–2078, 2010.
Olga Klopp. Noisy low-rank matrix completion with general sampling distribution. Bernoulli, 20
(1):282–303, 2014.
Olga Klopp, Jean Lafond, Eric Moulines, and Joseph Salmon. Adaptive multinomial matrix com-
pletion. Electronic Journal of Statistics, 9(2):2950–2975, 2015.
Vladimir Koltchinskii, Karim Lounici, and Alexandre B Tsybakov. Nuclear-norm penalization and
optimal rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5):2302–2329,
2011.
Geofferey N Masters and Benjamin D Wright. The essential process in a family of measurement
models. Psychometrika, 49(4):529–544, 1984.
Andrew D McRae and Mark A Davenport. Low-rank matrix completion and denoising under Pois-
son noise. Information and Inference: A Journal of the IMA, 10(2):697–720, 2021.
Sahand Negahban and Martin J Wainwright. Restricted strong convexity and weighted matrix com-
pletion: Optimal bounds with noise. Journal of Machine Learning Research, 13(1):1665–1697,
2012.
OECD. PISA 2018 assessment and analytical framework. OECD Publishing, Paris, France, 2019a.
OECD. PISA 2018 techinical report. OECD Publishing, Paris, France, 2019b.
James M Ortega and Werner C Rheinboldt. Iterative solution of nonlinear equations in several
variables. SIAM, Philadelphia, PA, 2000.
Mark Reckase. Multidimensional item response theory. Springer, New York, NY, 2009.
Geneviève Robin, Julie Josse, Éric Moulines, and Sylvain Sardy. Low-rank model with covariates
for count data with missing values. Journal of Multivariate Analysis, 173:416–434, 2019.
Geneviève Robin, Olga Klopp, Julie Josse, Éric Moulines, and Robert Tibshirani. Main effects and
interactions in mixed and incomplete data frames. Journal of the American Statistical Associa-
tion, 115(531):1292–1303, 2020.
Anders Skrondal and Sophia Rabe-Hesketh. Generalized latent variable modeling: Multilevel,
longitudinal, and structural equation models. CRC Press, Boca Raton, FL, 2004.
Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of computational
mathematics, 12(4):389–434, 2012.
Michel Wedel, Ulf Böckenholt, and Wagner A Kamakura. Factor models for multivariate count
data. Journal of Multivariate Analysis, 87(2):356–369, 2003.
65
C HEN AND L I
Per-Åke Wedin. Perturbation bounds in connection with singular value decomposition. BIT Numer-
ical Mathematics, 12:99–111, 1972.
Dong Xia and Ming Yuan. Statistical inferences of linear forms for noisy matrix completion. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 83(1):58–77, 2021.
Haoran Zhang, Yunxiao Chen, and Xiaoou Li. A note on exploratory item factor analysis by singular
value decomposition. Psychometrika, 85(2):358–372, 2020.
66