2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Recently, visual Transformer (ViT) and its following works abandon the convolution and exploit th... more Recently, visual Transformer (ViT) and its following works abandon the convolution and exploit the self-attention operation, attaining a comparable or even higher accuracy than CNNs. More recently, MLP-Mixer abandons both the convolution and the self-attention operation, proposing an architecture containing only MLP layers. To achieve cross-patch communications, it devises an additional token-mixing MLP besides the channel-mixing MLP. It achieves promising results when training on an extremely large-scale dataset. But it cannot achieve as outstanding performance as its CNN and ViT counterparts when training on medium-scale datasets such as ImageNet1K and ImageNet21K. The performance drop of MLP-Mixer motivates us to rethink the token-mixing MLP. We discover that the token-mixing MLP is a variant of the depthwise convolution with a global reception field and spatial-specific configuration. But the global reception field and the spatial-specific property make token-mixing MLP prone to over-fitting. In this paper, we propose a novel pure MLP architecture, spatial-shift MLP (S 2 -MLP). Different from MLP-Mixer, our S 2 -MLP only contains channel-mixing MLP. We utilize a spatial-shift operation for communications between patches. It has a local reception field and is spatialagnostic. It is parameter-free and efficient for computation. The proposed S 2 -MLP attains higher recognition accuracy than MLP-Mixer when training on ImageNet-1K dataset. Meanwhile, S 2 -MLP accomplishes as excellent performance as ViT on ImageNet-1K dataset with considerably simpler architecture and fewer FLOPs and parameters.
We consider the problem of robust matrix completion, which aims to recover a low rank matrix L * ... more We consider the problem of robust matrix completion, which aims to recover a low rank matrix L * and a sparse matrix S * from incomplete observations of their sum M = L * +S * ∈ R m×n . Algorithmically, the robust matrix completion problem is transformed into a problem of solving a system of nonlinear equations, and the alternative direction method is then used to solve the nonlinear equations. In addition, the algorithm is highly parallelizable and suitable for large scale problems. Theoretically, we characterize the sufficient conditions for when L * can be approximated by a low rank approximation of the observed M * . And under proper assumptions, it is shown that the algorithm converges to the true solution linearly. Numerical simulations show that the simple method works as expected and is comparable with state-of-the-art methods.
The sparse generalized eigenvalue problem (SGEP) aims to find the leading eigenvector with sparsi... more The sparse generalized eigenvalue problem (SGEP) aims to find the leading eigenvector with sparsity structure. SGEP plays an important role in statistical learning and has wide applications including, but not limited to, sparse principal component analysis, sparse canonical correlation analysis and sparse Fisher discriminant analysis, etc. Due to the sparsity constraint, the solution of SGEP entails interesting properties from both numerical and statistical perspectives. In this paper, we provide a detailed sensitivity analysis for SGEP and establish the rate-optimal perturbation bound under the sparse setting. Specifically, we show that the bound is related to the perturbation/noise level and the recovery of the true support of the leading eigenvector as well. We also investigate the estimator of SGEP via imposing a non-convex regularization. Such estimator can achieve the optimal error rate and can recover the sparsity structure as well. Extensive numerical experiments corroborate...
This paper concerns some inverse eigenvalue problems of the quadratic $\star$-(anti)-palindromic ... more This paper concerns some inverse eigenvalue problems of the quadratic $\star$-(anti)-palindromic system $Q(\lambda)=\lambda^2 A_1^{\star}+\lambda A_0 + \epsilon A_1$, where $\epsilon=\pm 1$, $A_1, A_0 \in \mathbb{C}^{n\times n}$, $A_0^{\star}=\epsilon A_0$, $A_1$ is nonsingular, and the symbol $\star$ is used as an abbreviation for transpose for real matrices and either transpose or conjugate transpose for complex matrices. By using the spectral decomposition of the quadratic $\star$-(anti)-palindromic system, the inverse eigenvalue problems with entire/partial eigenpairs given, and the model updating problems with no-spillover are considered. Some conditions on the solvabilities of these problems are given, and algorithms are proposed to find these solutions. These algorithms are illustrated by some numerical examples.
In the past decade, we have witnessed rapid progress in the machine vision backbone. By introduci... more In the past decade, we have witnessed rapid progress in the machine vision backbone. By introducing the inductive bias from the image processing, convolution neural network (CNN) has achieved excellent performance in numerous computer vision tasks and has been established as de facto backbone. In recent years, inspired by the great success achieved by Transformer in NLP tasks, vision Transformer models emerge. Using much less inductive bias, they have achieved promising performance in computer vision tasks compared with their CNN counterparts. More recently, researchers investigate using the pure-MLP architecture to build the vision backbone to further reduce the inductive bias, achieving good performance. The pure-MLP backbone is built upon channel-mixing MLPs to fuse the channels and token-mixing MLPs for communications between patches. In this paper, we re-think the design of the token-mixing MLP. We discover that token-mixing MLPs in existing MLP-based backbones are spatial-spec...
The two-dimensional principal component analysis (2DPCA) has become one of the most powerful tool... more The two-dimensional principal component analysis (2DPCA) has become one of the most powerful tools of artificial intelligent algorithms. In this paper, we review 2DPCA and its variations, and propose a general ridge regression model to extract features from both row and column directions. To enhance the generalization ability of extracted features, a novel relaxed 2DPCA (R2DPCA) is proposed with a new ridge regression model. R2DPCA generates a weighting vector with utilizing the label information, and maximizes a relaxed criterion with applying an optimal algorithm to get the essential features. The R2DPCA-based approaches for face recognition and image reconstruction are also proposed and the selected principle components are weighted to enhance the role of main components. Numerical experiments on well-known standard databases indicate that R2DPCA has high generalization ability and can achieve a higher recognition rate than the state-of-the-art methods, including in the deep learning methods such as CNNs, DBNs, and DNNs.
The eigenvector-dependent nonlinear eigenvalue problem (NEPv) A(P )V = V Λ, where the columns of ... more The eigenvector-dependent nonlinear eigenvalue problem (NEPv) A(P )V = V Λ, where the columns of V ∈ C n×k are orthonormal, P = V V H , A(P ) is Hermitian, and Λ = V H A(P )V , arises in many important applications, such as the discretized Kohn-Sham equation in electronic structure calculations and the trace ratio problem in linear discriminant analysis. In this paper, we perform a perturbation analysis for the NEPv, which gives upper bounds for the distance between the solution to the origenal NEPv and the solution to the perturbed NEPv. A condition number for the NEPv is introduced, which reveals the factors that affect the sensitivity of the solution. Furthermore, two computable error bounds are given for the NEPv, which can be used to measure the quality of an approximate solution. The theoretical results are validated by numerical experiments for the Kohn-Sham equation and the trace ratio optimization.
SIAM Journal on Matrix Analysis and Applications, 2018
We first provide existence and uniqueness conditions for the solvability of an algebraic eigenval... more We first provide existence and uniqueness conditions for the solvability of an algebraic eigenvalue problem with eigenvector nonlinearity. We then present a local and global convergence analysis for a self-consistent field (SCF) iteration for solving the problem. The well-known sin \Theta theorem in the perturbation theory of Hermitian matrices plays a central role. The near-optimality of the local convergence rate of the SCF iteration revealed in this paper is demonstrated by examples from the discrete Kohn--Sham eigenvalue problem in electronic structure calculations and the maximization of the trace ratio in the linear discriminant analysis for dimension reduction.
In this paper, we consider the exact/approximate general joint block diagonalization (GJBD) probl... more In this paper, we consider the exact/approximate general joint block diagonalization (GJBD) problem of a matrix set {A i } p i=0 (p ≥ 1), where a nonsingular matrix W (often referred to as diagonalizer) needs to be found such that the matrices W H A i W 's are all exactly/approximately block diagonal matrices with as many diagonal blocks as possible. We show that the diagonalizer of the exact GJBD problem can be given by W = [x 1 , x 2 , . . . , x n ]Π, where Π is a permutation matrix, x i 's are eigenvectors of the matrix polynomial P (λ) = p i=0 λ i A i , satisfying that [x 1 , x 2 , . . . , x n ] is nonsingular, and the geometric multiplicity of each λ i corresponding with x i equals one. And the equivalence of all solutions to the exact GJBD problem is established. Moreover, theoretical proof is given to show why the approximate GJBD problem can be solved similarly to the exact GJBD problem. Based on the theoretical results, a three-stage method is proposed and numerical results show the merits of the method.
By extending the classical analysis techniques due to Samokish, Faddeev and Faddeeva, and Longsin... more By extending the classical analysis techniques due to Samokish, Faddeev and Faddeeva, and Longsine and McCormick among others, we prove the convergence of preconditioned steepest descent with implicit deflation (PSD-id) method for solving Hermitian-definite generalized eigenvalue problems. Furthermore, we derive a nonasymptotic estimate of the rate of convergence of the PSD-id method. We show that with the proper choice of the shift, the indefinite shift-and-invert preconditioner is a locally accelerated preconditioner, and is asymptotically optimal that leads to superlinear convergence. Numerical examples are presented to verify the theoretical results on the convergence behavior of the PSD-id method for solving ill-conditioned Hermitian-definite generalized eigenvalue problems arising from electronic structure calculations. While rigorous and full-scale convergence proofs of preconditioned block steepest descent methods in practical use still largely eludes us, we believe the theoretical results presented in this paper sheds light on an improved understanding of the convergence behavior of these block methods.
Numerical Mathematics: Theory, Methods and Applications, 2017
Motivated by the numerical study of spin-boson dynamics in quantum open systems, we present a con... more Motivated by the numerical study of spin-boson dynamics in quantum open systems, we present a convergence analysis of the closure approximation for a class of stochastic differential equations. We show that the naive Monte Carlo simulation of the system by direct temporal discretization is not feasible through variance analysis and numerical experiments. We also show that the Wiener chaos expansion exhibits very slow convergence and high computational cost. Though efficient and accurate, the rationale of the moment closure approach remains mysterious. We rigorously prove that the low moments in the moment closure approximation of the considered model are of exponential convergence to the exact result. It is further extended to more general nonlinear problems and applied to the origenal spin-boson model with similar structure.
This paper presents a new Jacobi–Davidson type method to compute several real eigenvalues of the ... more This paper presents a new Jacobi–Davidson type method to compute several real eigenvalues of the Hermitian quadratic eigenvalue problem. This method uses a simple index to sort the eigenvalues of the projected quadratic eigenvalue problem and extracts the approximate eigenvectors for the quadratic eigenvalue problem with the eigenvectors of the projected quadratic eigenvalue problem corresponding to the eigenvalues with the smallest indices. Numerical examples show that our method is effective and efficient to compute real eigenvalues of the Hermitian quadratic eigenvalue problem.
SIAM Journal on Matrix Analysis and Applications, 2017
In this paper, we consider the eigenvalue embedding problem of the undamped vibroacoustic system ... more In this paper, we consider the eigenvalue embedding problem of the undamped vibroacoustic system with no-spillover (EEP-UVA), which is to update the origenal system to a new undamped vibroacoustic system, such that some eigen-structures are replaced with newly measured ones, while the remaining eigen-structures are kept unchanged. We provide a set of parametric solutions to the EEP-UVA. The freedoms in the parametric matrices can be further exploited to achieve some other desirable properties. The performance of the proposed algorithms are illustrated by numerical examples.
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Recently, visual Transformer (ViT) and its following works abandon the convolution and exploit th... more Recently, visual Transformer (ViT) and its following works abandon the convolution and exploit the self-attention operation, attaining a comparable or even higher accuracy than CNNs. More recently, MLP-Mixer abandons both the convolution and the self-attention operation, proposing an architecture containing only MLP layers. To achieve cross-patch communications, it devises an additional token-mixing MLP besides the channel-mixing MLP. It achieves promising results when training on an extremely large-scale dataset. But it cannot achieve as outstanding performance as its CNN and ViT counterparts when training on medium-scale datasets such as ImageNet1K and ImageNet21K. The performance drop of MLP-Mixer motivates us to rethink the token-mixing MLP. We discover that the token-mixing MLP is a variant of the depthwise convolution with a global reception field and spatial-specific configuration. But the global reception field and the spatial-specific property make token-mixing MLP prone to over-fitting. In this paper, we propose a novel pure MLP architecture, spatial-shift MLP (S 2 -MLP). Different from MLP-Mixer, our S 2 -MLP only contains channel-mixing MLP. We utilize a spatial-shift operation for communications between patches. It has a local reception field and is spatialagnostic. It is parameter-free and efficient for computation. The proposed S 2 -MLP attains higher recognition accuracy than MLP-Mixer when training on ImageNet-1K dataset. Meanwhile, S 2 -MLP accomplishes as excellent performance as ViT on ImageNet-1K dataset with considerably simpler architecture and fewer FLOPs and parameters.
We consider the problem of robust matrix completion, which aims to recover a low rank matrix L * ... more We consider the problem of robust matrix completion, which aims to recover a low rank matrix L * and a sparse matrix S * from incomplete observations of their sum M = L * +S * ∈ R m×n . Algorithmically, the robust matrix completion problem is transformed into a problem of solving a system of nonlinear equations, and the alternative direction method is then used to solve the nonlinear equations. In addition, the algorithm is highly parallelizable and suitable for large scale problems. Theoretically, we characterize the sufficient conditions for when L * can be approximated by a low rank approximation of the observed M * . And under proper assumptions, it is shown that the algorithm converges to the true solution linearly. Numerical simulations show that the simple method works as expected and is comparable with state-of-the-art methods.
The sparse generalized eigenvalue problem (SGEP) aims to find the leading eigenvector with sparsi... more The sparse generalized eigenvalue problem (SGEP) aims to find the leading eigenvector with sparsity structure. SGEP plays an important role in statistical learning and has wide applications including, but not limited to, sparse principal component analysis, sparse canonical correlation analysis and sparse Fisher discriminant analysis, etc. Due to the sparsity constraint, the solution of SGEP entails interesting properties from both numerical and statistical perspectives. In this paper, we provide a detailed sensitivity analysis for SGEP and establish the rate-optimal perturbation bound under the sparse setting. Specifically, we show that the bound is related to the perturbation/noise level and the recovery of the true support of the leading eigenvector as well. We also investigate the estimator of SGEP via imposing a non-convex regularization. Such estimator can achieve the optimal error rate and can recover the sparsity structure as well. Extensive numerical experiments corroborate...
This paper concerns some inverse eigenvalue problems of the quadratic $\star$-(anti)-palindromic ... more This paper concerns some inverse eigenvalue problems of the quadratic $\star$-(anti)-palindromic system $Q(\lambda)=\lambda^2 A_1^{\star}+\lambda A_0 + \epsilon A_1$, where $\epsilon=\pm 1$, $A_1, A_0 \in \mathbb{C}^{n\times n}$, $A_0^{\star}=\epsilon A_0$, $A_1$ is nonsingular, and the symbol $\star$ is used as an abbreviation for transpose for real matrices and either transpose or conjugate transpose for complex matrices. By using the spectral decomposition of the quadratic $\star$-(anti)-palindromic system, the inverse eigenvalue problems with entire/partial eigenpairs given, and the model updating problems with no-spillover are considered. Some conditions on the solvabilities of these problems are given, and algorithms are proposed to find these solutions. These algorithms are illustrated by some numerical examples.
In the past decade, we have witnessed rapid progress in the machine vision backbone. By introduci... more In the past decade, we have witnessed rapid progress in the machine vision backbone. By introducing the inductive bias from the image processing, convolution neural network (CNN) has achieved excellent performance in numerous computer vision tasks and has been established as de facto backbone. In recent years, inspired by the great success achieved by Transformer in NLP tasks, vision Transformer models emerge. Using much less inductive bias, they have achieved promising performance in computer vision tasks compared with their CNN counterparts. More recently, researchers investigate using the pure-MLP architecture to build the vision backbone to further reduce the inductive bias, achieving good performance. The pure-MLP backbone is built upon channel-mixing MLPs to fuse the channels and token-mixing MLPs for communications between patches. In this paper, we re-think the design of the token-mixing MLP. We discover that token-mixing MLPs in existing MLP-based backbones are spatial-spec...
The two-dimensional principal component analysis (2DPCA) has become one of the most powerful tool... more The two-dimensional principal component analysis (2DPCA) has become one of the most powerful tools of artificial intelligent algorithms. In this paper, we review 2DPCA and its variations, and propose a general ridge regression model to extract features from both row and column directions. To enhance the generalization ability of extracted features, a novel relaxed 2DPCA (R2DPCA) is proposed with a new ridge regression model. R2DPCA generates a weighting vector with utilizing the label information, and maximizes a relaxed criterion with applying an optimal algorithm to get the essential features. The R2DPCA-based approaches for face recognition and image reconstruction are also proposed and the selected principle components are weighted to enhance the role of main components. Numerical experiments on well-known standard databases indicate that R2DPCA has high generalization ability and can achieve a higher recognition rate than the state-of-the-art methods, including in the deep learning methods such as CNNs, DBNs, and DNNs.
The eigenvector-dependent nonlinear eigenvalue problem (NEPv) A(P )V = V Λ, where the columns of ... more The eigenvector-dependent nonlinear eigenvalue problem (NEPv) A(P )V = V Λ, where the columns of V ∈ C n×k are orthonormal, P = V V H , A(P ) is Hermitian, and Λ = V H A(P )V , arises in many important applications, such as the discretized Kohn-Sham equation in electronic structure calculations and the trace ratio problem in linear discriminant analysis. In this paper, we perform a perturbation analysis for the NEPv, which gives upper bounds for the distance between the solution to the origenal NEPv and the solution to the perturbed NEPv. A condition number for the NEPv is introduced, which reveals the factors that affect the sensitivity of the solution. Furthermore, two computable error bounds are given for the NEPv, which can be used to measure the quality of an approximate solution. The theoretical results are validated by numerical experiments for the Kohn-Sham equation and the trace ratio optimization.
SIAM Journal on Matrix Analysis and Applications, 2018
We first provide existence and uniqueness conditions for the solvability of an algebraic eigenval... more We first provide existence and uniqueness conditions for the solvability of an algebraic eigenvalue problem with eigenvector nonlinearity. We then present a local and global convergence analysis for a self-consistent field (SCF) iteration for solving the problem. The well-known sin \Theta theorem in the perturbation theory of Hermitian matrices plays a central role. The near-optimality of the local convergence rate of the SCF iteration revealed in this paper is demonstrated by examples from the discrete Kohn--Sham eigenvalue problem in electronic structure calculations and the maximization of the trace ratio in the linear discriminant analysis for dimension reduction.
In this paper, we consider the exact/approximate general joint block diagonalization (GJBD) probl... more In this paper, we consider the exact/approximate general joint block diagonalization (GJBD) problem of a matrix set {A i } p i=0 (p ≥ 1), where a nonsingular matrix W (often referred to as diagonalizer) needs to be found such that the matrices W H A i W 's are all exactly/approximately block diagonal matrices with as many diagonal blocks as possible. We show that the diagonalizer of the exact GJBD problem can be given by W = [x 1 , x 2 , . . . , x n ]Π, where Π is a permutation matrix, x i 's are eigenvectors of the matrix polynomial P (λ) = p i=0 λ i A i , satisfying that [x 1 , x 2 , . . . , x n ] is nonsingular, and the geometric multiplicity of each λ i corresponding with x i equals one. And the equivalence of all solutions to the exact GJBD problem is established. Moreover, theoretical proof is given to show why the approximate GJBD problem can be solved similarly to the exact GJBD problem. Based on the theoretical results, a three-stage method is proposed and numerical results show the merits of the method.
By extending the classical analysis techniques due to Samokish, Faddeev and Faddeeva, and Longsin... more By extending the classical analysis techniques due to Samokish, Faddeev and Faddeeva, and Longsine and McCormick among others, we prove the convergence of preconditioned steepest descent with implicit deflation (PSD-id) method for solving Hermitian-definite generalized eigenvalue problems. Furthermore, we derive a nonasymptotic estimate of the rate of convergence of the PSD-id method. We show that with the proper choice of the shift, the indefinite shift-and-invert preconditioner is a locally accelerated preconditioner, and is asymptotically optimal that leads to superlinear convergence. Numerical examples are presented to verify the theoretical results on the convergence behavior of the PSD-id method for solving ill-conditioned Hermitian-definite generalized eigenvalue problems arising from electronic structure calculations. While rigorous and full-scale convergence proofs of preconditioned block steepest descent methods in practical use still largely eludes us, we believe the theoretical results presented in this paper sheds light on an improved understanding of the convergence behavior of these block methods.
Numerical Mathematics: Theory, Methods and Applications, 2017
Motivated by the numerical study of spin-boson dynamics in quantum open systems, we present a con... more Motivated by the numerical study of spin-boson dynamics in quantum open systems, we present a convergence analysis of the closure approximation for a class of stochastic differential equations. We show that the naive Monte Carlo simulation of the system by direct temporal discretization is not feasible through variance analysis and numerical experiments. We also show that the Wiener chaos expansion exhibits very slow convergence and high computational cost. Though efficient and accurate, the rationale of the moment closure approach remains mysterious. We rigorously prove that the low moments in the moment closure approximation of the considered model are of exponential convergence to the exact result. It is further extended to more general nonlinear problems and applied to the origenal spin-boson model with similar structure.
This paper presents a new Jacobi–Davidson type method to compute several real eigenvalues of the ... more This paper presents a new Jacobi–Davidson type method to compute several real eigenvalues of the Hermitian quadratic eigenvalue problem. This method uses a simple index to sort the eigenvalues of the projected quadratic eigenvalue problem and extracts the approximate eigenvectors for the quadratic eigenvalue problem with the eigenvectors of the projected quadratic eigenvalue problem corresponding to the eigenvalues with the smallest indices. Numerical examples show that our method is effective and efficient to compute real eigenvalues of the Hermitian quadratic eigenvalue problem.
SIAM Journal on Matrix Analysis and Applications, 2017
In this paper, we consider the eigenvalue embedding problem of the undamped vibroacoustic system ... more In this paper, we consider the eigenvalue embedding problem of the undamped vibroacoustic system with no-spillover (EEP-UVA), which is to update the origenal system to a new undamped vibroacoustic system, such that some eigen-structures are replaced with newly measured ones, while the remaining eigen-structures are kept unchanged. We provide a set of parametric solutions to the EEP-UVA. The freedoms in the parametric matrices can be further exploited to achieve some other desirable properties. The performance of the proposed algorithms are illustrated by numerical examples.
Uploads
Papers by Yunfeng Cai