Fei 2024
Fei 2024
https://doi.org/10.1007/s10462-024-10741-2
Abstract
Deep neural networks (DNNs) in 3D scenes show a strong capability of extracting high-
level semantic features and significantly promote research in the 3D field. 3D shapes and
scenes often exhibit complicated transformation symmetries, where rotation is a challeng-
ing and necessary subject. To this end, many rotation invariant and equivariant methods
have been proposed. In this survey, we systematically organize and comprehensively over-
view all methods. First, we rewrite the previous definition of rotation invariance and equiv-
ariance by classifying them into weak and strong categories. Second, we provide a uni-
fied theoretical framework to analyze these methods, especially weak rotation invariant and
equivariant ones that are seldom analyzed theoretically. We then divide existing methods
into two main categories, i.e., rotation invariant ones and rotation equivariant ones, which
are further subclassified in terms of manipulating input ways and basic equivariant block
structures, respectively. In each subcategory, their common essence is highlighted, a cou-
ple of representative methods are analyzed, and insightful comments on their pros and cons
are given. Furthermore, we deliver a general overview of relevant applications and datasets
for two popular tasks of 3D semantic understanding and molecule-related. Finally, we pro-
vide several open problems and future research directions based on challenges and difficul-
ties in ongoing research.
* Zhidong Deng
michael@mail.tsinghua.edu.cn
Jiajun Fei
feijj20@mails.tsinghua.edu.cn
1
State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science
and Technology, Institute for Artificial Intelligence at Tsinghua University (THUAI), Beijing
National Research Center for Information Science and Technology (BNRist), Tsinghua University,
Beijing 100084, China
13
Vol.:(0123456789)
168 Page 2 of 52 J. Fei, Z. Deng
1 Introduction
In recent years, DNNs have played a more and more important role in 3D analysis. DNNs
are capable of processing many types of 3D data, including multi-view images (Su et al.
2015; Qi et al. 2016; Yu et al. 2018), voxels (Maturana and Scherer 2015; Zhou and Tuzel
2018), point clouds (Qi et al. 2017a; Wang et al. 2019b; Fei et al. 2022), and particles
(Schütt et al. 2017; Thomas et al. 2018; Satorras et al. 2021b). They have outperformed
traditional methods and shown great generalizability in a sequence of tasks, like classifi-
cation (Su et al. 2015; Qi et al. 2017a; Wang et al. 2019b), segmentation (Landrieu and
Simonovsky 2018; Meng et al. 2019; Furuya et al. 2020), detection (Zhou and Tuzel 2018;
Shi et al. 2019; Wang et al. 2023b), property prediction (Schütt et al. 2017; Satorras et al.
2021b), and generation (Hoogeboom et al. 2022; Guan et al. 2023).
Nonetheless, significant gaps exist between experiments and applications, restricting
the actual deployment of DNNs. For example, most experiments are conducted under
ideal settings with little noise, known data distribution, and canonical poses, which can-
not be completely met in practical applications. Among them, canonical poses are widely
adopted in 3D research, where 3D data is first aligned manually and then processed by
DNNs. However, such a setting leads to two main problems. First, these models may have
severe performance drops when evaluated with non-aligned 3D data, as shown in previ-
ous works (Esteves et al. 2018a; Sun et al. 2019b; Zhao et al. 2022a). Zhao et al. (2020b)
explore the fragility of 3D DNNs and achieve an over 95% successful rate of black-box
adversarial attacks through slightly rotating the evaluation 3D data. Second, these DNNs
cannot be applied to solve tasks requiring the output consistency. For example, the atom-
ization energies of molecules are irrelevant to their absolute positions and orientations
(Blum and Reymond 2009; Rupp et al. 2012). If DNNs are trained with aligned mole-
cules, they inevitably learn the nonexistent relationship between absolute coordinates and
molecular properties and may overfit training data. These models are unreliable and use-
less as they cannot give the same prediction concerning arbitrarily-rotated inputs. There
have been many ways to address such problems. We summarize them as rotation invariant
and equivariant methods in this survey.
Rotation invariance has been investigated in traditional 3D descriptors. Before the
emergence of DNNs, most methods can only capture low-level geometric features based
on transformation invariance. FPFH (Rusu et al. 2009) combines coordinates and esti-
mated surface normals to define Darboux frames. Then it uses several angular variations
to represent the surface properties. SHOT (Tombari et al. 2010) designs unique and unam-
biguous local reference frames (LRFs) to construct robust and expressive 3D descriptors.
Drost et al. (2010) create a global description with point pair features (PPFs) composed of
relative distances and angles. They can effectively handle tasks like pose estimation and
registration. Recently, Horwitz and Hoshen (2023) revisit the importance of traditional
descriptors on 3D anomaly detection. DNNs can learn high-level semantic features and
accomplish complicated tasks, but they usually ignore the rotation invariance and equiv-
ariance, making them unreliable for real-world applications. Existing works deal with this
problem from different perspectives. T-Net (Qi et al. 2017a) directly regresses transfor-
mation matrices from raw point clouds to transform poses and features. ClusterNet (Chen
et al. 2019b) constructs k nearest neighbors (kNN) graphs and computes several invari-
ant distances and angles, which are fed into hierarchical networks for complicated down-
stream tasks. Tensor field networks (TFNs) (Thomas et al. 2018; Thomas 2019) are equiv-
ariant neural networks based on the irreducible representation of SO(3). They have a solid
13
Rotation invariance and equivariance in 3D deep learning: a… Page 3 of 52 168
mathematical foundation and perform well over various tasks, including shape classifica-
tion and RNA structure scoring.
Many distinctive approaches have been developed for rotation invariance and equiv-
ariance. However, a comprehensive review of these methods is absent, making it chal-
lenging to keep pace with the recent progress and select appropriate methods for specific
tasks. Therefore, we are motivated to write this survey and fill the gap. Our contribu-
tions can be summarized from three aspects. First, this survey systematically overviews
existing works related to rotation invariance and equivariance, which are further divided
into several subcategories based on their structures and mathematical foundations. Sec-
ond, we unify the notations of different methods, providing an intuitive perspective for
analysis and comparisons. Third, we point out some open problems and propose future
research directions based on them.
This paper is organized as shown in Fig. 1. In Sect. 2, we introduce the mathematical
background of rotation invariance and equivariance, including the definition, commonly-
used rotation groups, and evaluation metrics. Rotation invariant and equivariant methods
are comprehensively overviewed and discussed, respectively, in Sect. 3 and Sect. 4. The
applications and datasets are also inspected in Sect. 5. In Sect. 6, we point out several
future research directions based on unsolved problems. Notations are listed in Table 1 for
better readability.
2 Background
This section introduces the background knowledge required to understand rotation invari-
ance and equivariance. The basic concepts of group theory are beneficial for better com-
prehension. Readers may refer to other textbooks for more details, including Group Theory
in Physics: An Introduction (Cornwell 1997) and Algebra (Artin 2013).
Invariance and equivariance have been formulated in much related work (Cohen and
Welling 2016; Thomas et al. 2018; Cohen et al. 2018a, 2019a; Thomas 2019). However,
Fig. 1 Overview of our survey. After the mathematical background is stated, rotation invariant and equiv-
ariant methods are introduced, respectively. Then we give a comprehensive overview of applications and
datasets and point out future directions based on open problems. Best viewed in color
13
168 Page 4 of 52 J. Fei, Z. Deng
their definition cannot cover some methods in this survey. Thus, we elaborately make a
broad definition to include them. The definition of both strong/weak invariance and equiv-
ariance can be seen in Definition 1. Compared with the previous definition, we introduce
weak invariance and equivariance through the G-variant error so as to cover methods not
satisfying Eq. 1. It should be noted that the determination of C as an exact value is unnec-
essary since any function is C-weakly equivariant if C is large enough (+∞). So C is gen-
erally omitted in this survey. If a method is weakly equivariant, it means that its G-variant
error is relatively small or reduced after appropriate training.
∫X ∫G (2)
d(f (g ⋅ x), g ⋅ f (x))d𝜇(g)dx < C.
13
Rotation invariance and equivariance in 3D deep learning: a… Page 5 of 52 168
left side of Eq. 2 is substituted with summation. The integral is named the G-variant error,
denoted by E(f ).
SO(3), O(3), SE(3), E(3), and their proper subgroups are the commonly-used groups
that describe 3D rotation, reflection, and translation. Their differences are listed in Table 2.
Unless otherwise specified, we focus on rotation in the 3D Euclidean space, and G is a
subgroup of SO(3).
Rotation invariant and equivariant methods require specific evaluation metrics to
reflect the performances on certain tasks and the invariance/equivariance. Let us take a
{( )}N
supervised learning task with N training samples xi , yi i=1 as an example. f ∶ X → Y
is the deep model and L ∶ Y × Y → ℝ is the evaluation function. If there is no require-
∑ � � � �
ment on equivariance, the metric is computed as L = i L f xi , yi . However, if equiv-
( ( ) )
ariance is considered, the model f should consider L f g ⋅ xi , g ⋅ yi for all g ∈ G instead
( ( ) )
of only L f xi , yi . Accordingly, the metric LG is given as
∑ ( ( ) )
∫G (3)
LG = L f g ⋅ xi , g ⋅ yi d𝜇(g).
i
13
168 Page 6 of 52 J. Fei, Z. Deng
Data augmentation methods only make changes to the loss function instead of any model
structure. They use samplings to estimate the integration in Eq. 3. Thus, the loss LG is con-
structed as
∑ ( ( ) )
LG = L f ĝ ⋅ xi , yi ,
i,̂g
(4)
3.2 Multi‑view methods
Unlike data augmentation methods, multi-view methods attain rotation invariance by modify-
ing the model instead of the loss function. In multi-view methods, the model f ∶ X → Y is
built as
∑ ( )
f (x) = wj fb ĝ j ⋅ x ,
�
(5)
ĝ j ∈G
13
Rotation invariance and equivariance in 3D deep learning: a… Page 7 of 52 168
( )
∑ ( ) ∑ ( )
�X �G
E(f ) = d wj fb ĝ j g ⋅ x , wj fb ĝ j ⋅ x d𝜇(g)dx
j j
≤
∑ ( ( ) ( ))
�X �G
wj d fb ĝ j g ⋅ x , fb ĝ j ⋅ x d𝜇(g)dx
j (6)
∑ ( ( ) )
�X �G
= wj d fb ĝ j ĝg−1
j ⋅ x , f b (x) d𝜇(g)dx
j
( ) ( )
�X �G
= d fb (g ⋅ x), fb (x) d𝜇(g)dx = E fb .
As CNNs become a powerful tool for images (Krizhevsky et al. 2012; Simonyan and Zis-
serman 2014; Szegedy et al. 2015; He et al. 2016), researchers exploit multi-view images
to extract features from 3D shapes. Most multi-view methods take images as input, while
some later methods also process point clouds and voxels. Although these methods are
designed as 3D feature extractors firstly, they can improve rotation invariance and are
chosen as baselines by related work (Esteves et al. 2018a; Rao et al. 2019; Zhang et al.
2019a) (Fig. 3). MVCNN (Su et al. 2015) is a pioneering method showing that a fixed set
of rendered views is highly informative for 3D shape recognition. VoxNet (Maturana and
Scherer 2015) pools multi-view voxel features and achieves 2D rotation invariance around
the z-axis. Qi et al. (2016) introduce multi-resolution filtering for multi-view CNNs and
improve the classification accuracy. Cao et al. (2017) propose spherical projections to col-
lect depth variations and contour information of different views for better performances.
Zhang et al. (2018) apply a PointNet-like (Qi et al. 2017a) method on multi-view 2.5D
point clouds to fuse information from all views. View-GCN++ (Wei et al. 2022) exploits
rotation robust view-sampling to deal with rotation sensitivity. Besides, some methods
replaces weighted average in Eq. 5 with pooling/fusion modules to enhance effectiveness
and efficiency (Wang et al. 2017; Roveri et al. 2018; Yu et al. 2018; Wei et al. 2020; Li
et al. 2020; Chen and Chen 2022). These modifications do not necessarily improve the
invariance, so we omit them here.
Most multi-view methods take images as the input, so they can handle 3D rotation
invariance using powerful 2D models (Su et al. 2015; Qi et al. 2016; Cao et al. 2017).
Nonetheless, they lead to a heavy computational burden, making training and inference
|̂|
inefficient. As Eq. 5 shows, the computational burden of f is at least |G | times than that of
| |
|̂|
fb. For instance, |G| is 12 or 80 in MVCNN (Su et al. 2015). In addition, most existing
| |
multi-view methods are weakly rotation invariant. Their base models fb are not strongly
rotation invariant, such as 2D CNNs (Su et al. 2015; Qi et al. 2016; Wei et al. 2022) and
Fig. 3 A pipeline of multi-view methods. The 3D input is first rendered/sampled into multi-view data, pro-
cessed by non-invariant DNNs, and finally pooled for downstream tasks
13
168 Page 8 of 52 J. Fei, Z. Deng
non-invariant 3D networks (Zhang et al. 2018). So the composite models f do not possess
strong invariance.
3.4 Transformation methods
13
Rotation invariance and equivariance in 3D deep learning: a… Page 9 of 52 168
Fig. 4 T-Net (Qi et al. 2017a) directly regresses rotation matrices from coordinates. B, N refer to the num-
ber of batches and points, respectively. The numbers behind MLP are internal layer sizes
put raw point clouds and multi-view features into T-Net to robustify the model. In addition,
many other methods also include T-Net in their models for the effectiveness and stability in
different downstream tasks (Joseph-Rivlin et al. 2019; Chen et al. 2019a; Liu et al. 2019c;
Zhang et al. 2020a; Yu et al. 2020b; Wang et al. 2021; Poiesi and Boscaini 2021; Hegde
and Gangisetty 2021; Liu et al. 2022c; Zhu et al. 2022a).
Besides rotation matrices, some methods utilize other rotation representations. IT-Net
(Yuan et al. 2018) simultaneously canonicalizes rotation and translation through the qua-
ternion representation. PCPNet (Guerrero et al. 2018) and SCT (Liu et al. 2022a) regress
unit quaternions for pose canonicalization and point cloud recognition, respectively. Poiesi
and Boscaini (2023) learn a quaternion transformation network to refine the estimated
LRF. RotPredictor (Fang et al. 2020) applies PointConv (Wu et al. 2019) to regress Euler
angles, and RTN (Deng et al. 2021b) predicts discrete Euler angles. C3DPO (Novotny et al.
2019) divides the shape into view-specific pose parameters and a view-invariant shape
basis. PaRot (Zhang et al. 2023a) also disentangles invariant features with equivariant
poses via the equivariance loss. Wang et al. (2022c) formulate the rotation invariant learn-
ing problem as the minimization of an energy function, solved with an iterative strategy.
Some methods are embedded in a self-supervised learning framework. Some works (Zhou
et al. 2022b; Mei et al. 2023) enforce the consistency of canonical poses with a rotation
equivariance loss. Sun et al. (2021) utilize Capsule Networks (Hinton et al. 2011) with the
canonicalization loss for object-centric reasoning. Kim et al. (2022) introduce a self-super-
vised learning framework to predict canonical axes of point clouds using the icosahedral
group. Currently, only a few methods are strongly rotation invariant. LGANet (Gu et al.
2021b) and ELGANet (Gu et al. 2022) exploit graph convolutional networks (GCNs) to
process rotation invariant distances and angles, where the outputs are orthogonalized into
rotation matrices. Katzir et al. (2022) employ equivariant networks to learn canonical poses
of point clouds. RIP-NeRF (Wang et al. 2023c) transforms raw coordinates into invariance
one for fine-grained editing. EIPs (Fei and Deng 2024) disentangle rotation invariance and
point cloud processing with efficient invariant poses.
Beneficial from their straightforward idea, transformation methods are extensively used
in many applications (Liu et al. 2019c; Guerrero et al. 2018; Zhu et al. 2022a). Notwith-
standing, the invariance condition is always ignored by some works, especially those using
T-Nets (Qi et al. 2017a; Joseph-Rivlin et al. 2019; Poiesi and Boscaini 2021). Thus, the
transformation functions have no contribution to the rotation invariance. Besides, some
methods cannot output proper rotation representations. For example, T-Net (Qi et al.
2017a) cannot guarantee proper output rotation matrices, even using the regularization
term. In this case, 3D shapes are inevitably distorted, and some structural information may
be lost. Moreover, heavy data augmentation is sometimes required for good performance.
13
168 Page 10 of 52 J. Fei, Z. Deng
Le (2021) shows that T-Net needs a large amount of data augmentation to learn a steady
transformation policy.
Invariant value methods achieve rotation invariance through constructing invariant val-
ues from coordinate inputs. Here, invariant values include distances, inner products, and
angles:
( )
‖ui ‖ (distance), ui ⋅ uj (inner product), ∠ u , u (angle), (9)
‖ ‖ i j
{ }
where ui ⊂ ℝ3 is a nonzero geometric vector set. Based on these invariant values, the
model f ∶ X → Y is generally set up as
( )
f (x) = fb fi (x) , (10)
As fb is usually a deep point cloud model with slight modification, the handcrafted rules in
fi are the core of invariant value methods. We divide existing methods into several groups
according to the form of invariant values.
3.5.1 Local values
Many methods generate invariant values in the local neighborhoods. ClusterNet (Chen
et al. 2019b) introduces rigorously rotation invariant (RRI) mappings based on a kNN
graph as
( { } ) [ {( ) )}k
]
k ‖ ‖ (
RRI xi , xij j=1 = ‖ x ‖
‖ i‖ , x
‖ ij ‖ , ∠ x , x
i ij , 𝜙ij ,
‖ ‖ j=1
13
Rotation invariance and equivariance in 3D deep learning: a… Page 11 of 52 168
Fig. 5 Representative invariant values from a local values, b LRF-based values, c PPF-based values, and d
global values. The solid lines are invariant values or necessary components of invariant values
where d(0)
ij
= xij − xi , d(1)
ij
= xij − mi , d(0)
i
= mi − xi . It applies a multi-layer perceptron
(MLP) to generate final features. RIF has been widely adopted by many works (Chou et al.
2021; Zhang et al. 2022; Wang and Rosen 2023; Fan et al. 2023).
13
168 Page 12 of 52 J. Fei, Z. Deng
Later work mainly adds more reference points and invariant values to improve per-
formances. Some representative invariant values are collected in Table 3. Readers may
refer to the original papers for details.
3.5.2 LRF‑based values
LRF-based values are special cases ( ) of local values. Specially, if three orthogonal axes
e1 , e2 , e3 can be determined in N xi , then xij ⋅ e1 , xij ⋅ e2 , xij ⋅ e3 are relative coordinates in
this LRF. LRFs are adopted in many handcrafted 3D descriptors, like FPFH (Rusu et al.
2009), SHOT (Tombari et al. 2010), and RoPS (Guo et al. 2013). It should be noted that
methods only using principal component analysis (PCA) to define LRFs are discussed sep-
arately in the next section instead of this one. We divide these methods according to the
number of LRFs in each neighborhood.
Some methods define a unique LRF in each neighborhood. Usually, the normal vector is
selected as one axis, a normalized weighted average vector is selected as another, and their
cross product is chosen as the final axis. We summarize these methods in Table 4.
Besides, there are also methods with multiple LRFs in each neighborhood. A common
choice of LRF is the Darboux frame defined as
( ) ( ) ( ) ( )
ex = ni , ey xij = N d(0)
ij
× ex , ez xij = ex × ey xij , (14)
where ey and ez depend on not only xi but also xij . CRIN (Lou et al. 2023) proposes another
LRF by considering the original space basis. Some representative invariant values are
listed in Table 5.
� ∑ �
Pujol-Miró et al. (2019) k
ez ∼ PCA, ex = NO ez , j=1 Iij xij
( ( ))
GFrames Melzi et al. (2019) ez = ni , ex = N ∇f xi
( ) ( )
AECNN (Fig. 5b) Zhang et al. (2020b) ez = N xi , ex = NO xi , mi
( )
LFNet Cao et al. (2021) ez = ni , ex = N xi × ez
( )
PaRI-Conv Chen and Cong (2022) ex = ni , ey = N d(0)i
( ) ( )
Sahin et al. (2022) ex = N xi , ey = N xi × d(0) i
( ) ( )
Li et al. (2023b) ex = N xi , ey = NO xi , ni
( )
Only two axes are listed and their cross product determines the final axisri is the radius of N xi . Iij is the
intensity of xij . f is a scalar function defined on the manifolds of the shapes, and ∇ refers to the intrinsic
gradient
13
Rotation invariance and equivariance in 3D deep learning: a… Page 13 of 52 168
3.5.3 PPF‑based values
PPFs (Drost et al. 2010) are initially proposed in the 3D object recognition algorithm,
which describe the relative information between two points x1 , x2 as
( ) [ ( ) ( ) ( )]
PPF x1 , x2 = ‖ ‖
‖d12 ‖, ∠ n1 , d12 , ∠ n2 , d12 , ∠ n1 , n2 , (15)
where dij = xi − xj , as Fig. 5c shows. PPFs are strongly rotation invariant, making them
suitable for invariant feature extraction.
PPFNet (Deng et al. 2018b) concatenates PPFs with coordinates and normals to improve
the robustness of 3D point matching. PPF-FoldNet (Deng et al. 2018a) combines PPFNet
with FoldingNet (Yang et al. 2018) to learn invariant descriptors, using only PPFs as input
features. Bobkov et al. (2018) slightly modify and apply the PPFs to classification and
retrieval. GMCNet (Pan et al. 2021) combines RRI (Chen et al. 2019b) and PPFs for rig-
orous partial point cloud registration. Using hypergraphs, Triangle-Net (Xiao and Wachs
2021) extend PPFs to three points (triangles). PaRI-Conv (Chen and Cong 2022) augments
PPFs with two azimuth angles and uses them to synthesize pose-aware dynamic kernels.
PPFs have been widely employed in rotation invariant point cloud matching and registra-
tion (Zhao et al. 2021; Yu et al. 2023; Zhang et al. 2023c).
13
168 Page 14 of 52 J. Fei, Z. Deng
3.5.4 Global values
Some methods do not require local neighborhoods to evaluate invariant values. SRINet
(Sun et al. 2019b) defines point projection mapping (PPM, Fig. 5d) through projecting xi
on three axes a1 , a2 , a3 as
( ) [ ( ) ( ) ( ) ]
PPM xi = cos ∠ a1 , xi , cos ∠ a2 , xi , cos ∠ a3 , xi , ‖ ‖
‖xi ‖ , (16)
where a1 = arg maxx∈{xi } ‖x‖, a2 = arg minx∈{xi } ‖x‖, a3 = a1 × a2. Based on SRINet,
Tao et al. (2021) add attention modules, and SCT (Liu et al. 2022a) adds a quaternion
T-Net for better performances. Sun et al. (2023) apply SRINet on non-rigid point clouds.
Some works (Xu et al. 2021b; Qin et al. 2023a) employ the sorted Gram matrix as invariant
{ }N ( )
values. The Gram matrix for xi 1 is computed as xi ⋅ xj N×N , each row of which is then
sorted and fed into point-based networks for permutation and rotation invariance.
3.5.5 Others
In addition to the above invariant values, the other values that are hard to classify are
listed here. SchNet (Schütt et al. 2017, 2018) gains rotation invariance through intera-
tomic distances. SkeletonNet (Ke et al. 2017) uses angles and ratios between distances
as invariant features for human skeletons. Liu et al. (2018) leverage relative distances
on global point cloud registration. 3DTI-Net (Pan et al. 2019) utilizes translation invari-
ant graph filter kernel and employs the norms as invariant features. 3DMol-Net (Li et al.
2021a) extends it to molecular applications. RISA-Net (Fu et al. 2020) employs edge
lengths and dihedral angles on 3D retrieval tasks. RMGNet (Furuya et al. 2020) feeds
several handcrafted descriptors into GCNs for point cloud segmentation. GS-Net (Xu
et al. 2020) uses eigenvalue decomposition on local distance graphs and exploits these
eigenvalues as invariant features. SN-Graph (Zhang et al. 2021b) leverages 15 cosine
values, 7 distances, and 7 radii as invariant values. TinvNN (Zhang et al. 2021c) exer-
cises eigenvalue decomposition on the zero-centered distance matrices to get invariant
features. ComENet (Wang et al. 2022b) exploits several rotation angles for global com-
pleteness. DuEqNet (Wang et al. 2023b) builds equivariant networks through relative
distances for object detection. SGPCR (Salihu and Steinbach 2023) explores the rota-
tion invariant convolution between two spherical Gaussians for object registration and
retrieval. RadarGNN (Fent et al. 2023) employs rotation invariant bounding boxes and
representation for radar-based perception. GeoTransformer Qin et al. (2023b) further
applies sinusoidal embedding on distances and angles for robust registration.
3.5.6 Discussion
Unlike the methods above, invariant value methods are strongly rotation invariant, and
their superiority has been demonstrated with many experiments (Xu et al. 2021b; Chen
and Cong 2022; Sahin et al. 2022; Wang et al. 2023b). Nevertheless, there are still several
concerns.
Singularity Almost every method has singularities that make invariant values mean-
ingless, including coincident points (e.g., xi = mi ⇒ d(0)
i
= 0 leads to undefined angles in
RIConv (Zhang et al. 2019b)), collinear vectors (e.g., if cross products in Cao et al. (2021);
Chen and Cong (2022); Sahin et al. (2022) give zero output, then their LRFs are not
13
Rotation invariance and equivariance in 3D deep learning: a… Page 15 of 52 168
properly defined), and nonunique candidate values (e.g. if two or more points are satisfying
arg maxx∈{xi } ‖x‖, then a1 in SRINet (Sun et al. 2019b) is not determined).
Irreversibility For fi ∶ X → Z , if there exists fri ∶ Z → X satisfying
( )
∀x ∈ X, ∃ gx ∈ G, fri fi (x) = gx ⋅ x, (17)
then fi is reversible. Some irreversible invariant values may lose certain structural
information, harming downstream task performances (Zhang et al. 2019b; Sun et al.
2019b).
Discontinuity The base model fb is generally a continuous deep model. So if fi is
discontinuous at x0 , then the model f may also be discontinuous at x0 , making it hard to
train with gradient-based optimization algorithms. For example, fi in SRINet (Sun et al.
2019b) is discontinuous on point clouds whose two longest vectors are close, since it
needs them to define axes.
Reflection Distances, inner products, and angles are invariant to rotations and reflec-
tions. Thus, almost all methods without cross products cannot distinguish rotations from
reflections (Drost et al. 2010; Zhang et al. 2019b; Xu et al. 2021b).
3.6 PCA‑based methods
PCA-based methods construct the model similarly to transformation methods, while
the transformation function is unlearnable PCA alignment, as Algorithm 1 shows.
X is usually zero-centered to mitigate the influence of translations and 𝚺 is called
the covariance
( matrix.
) PCA alignment can guarantee the rotation invariance. For
XR = XR RRT = I , if
( ) ( )T
𝚺R = XTR XR = RT 𝚺R = RT V 𝚲 RT V ⇒ V R = RT V, (18)
Table 6 Different disambiguation rules adopted by PCA-based methods. k = 1, 2, 3 unless otherwise speci-
fied
Method fi
� ∑ � ��
DLAN Furuya and Ohbuchi (2016) sk = sgn vk ⋅ i xij − x(c) k = 1, 3,
i
s2 = s1 s3 det (V)
� � ��
GCANet Zhang et al. (2020c) ∑
sk = sgn vk ⋅ j=1 wj xij − xi
�∑ � � ���
Fan et al. (2020) GPA-Net Shan et al. (2023) sk = sgn j sgn vk ⋅ xij − xi
( )
Gandikota et al. (2021) sk = sgn U1k
� ∑ � ��
R-PointHop Kadam et al. (2022) S3I-PointHop Kadam et al. sk = sgn vk ⋅ j xij − x(m)
(2023) i
( )
LGR-Net Zhao et al. (2022a) sk = sgn vk ⋅ xmax
x(c)
i
is the center of the Spheres-Of-Interest. x(m)
i
is the median point. xmax is the farthest point from the cen-
� �
mi − �xij − xi �
‖ ‖ � �
troid. mi = maxj ‖xij − xi ‖, wj = ∑ � �
‖ ‖ � �
k mi − �xik − xi �
13
168 Page 16 of 52 J. Fei, Z. Deng
Most methods disambiguate signs through handcrafted rules, which generally involve
dot products between vk and other vectors. If vk → −vk ⇒ sk → −sk , then sk vk remains the
same. Some representative rules are listed in Table 6.
Some methods consider combinations of signs instead of just choosing one. Xiao et al.
(2020) fuse all combinations through a self-attention module. OrthographicNet (Kasaei
2021) transforms raw points into canonical poses and generates several projection views
for 3D object recognition. MolNet-3D (Liu et al. 2022c) average the results from 4 poses to
predict the molecular properties. Puny et al. (2022) convert the group averaging operation
to the subset averaging one with frames, where 4 and 8 frames are exploited for SO(3) and
O(3), respectively. Li et al. (2023a) apply this approach on 3D planar reflective symmetry
detection.
Fig. 6 A pipeline of PCA-based methods. Several pose candidates are first generated from the 3D input,
then they are either disambiguated using handcrafted rules/pose selectors or fused together
13
Table 7 Comparisons of different rotation invariant methods
Method Data format Invariance Limitation
Transformation methods Point clouds Weak Improper rotation representation Data augmentation requirement
Invariant value methods Point clouds, meshes Strong Singularity Irreversibility Discontinuity Reflection
PCA-based methods Point clouds, meshes Strong Singularity Discontinuity Heavy computational burden Numerical instability
Page 17 of 52 168
13
168 Page 18 of 52 J. Fei, Z. Deng
Some works utilize pose selectors to make one pose from multiple candidates. PR-
invNet (Yu et al. 2020a) augments 8 poses with discrete rotation groups and utilizes the
pose selector to choose the final pose. Li et al. (2021b) investigate the inherent ambiguity
of PCA alignment. They argue that the order of ex , ey , ez is also ambiguous and the total
ambiguities is 4(sign) × 6(order) = 24. All poses are fused through a pose selector to cre-
ate an optimal one. Besides coordinates, some works apply PCA on network weights (Xie
et al. 2023) and the convex hull (Pop et al. 2023).
PCA-based methods are effective with intrinsic strong rotation invariance. Furthermore,
they are always combined with invariant value methods for better performances (Yu et al.
2020a; Zhao et al. 2022a; Chen and Cong 2022). However, sign disambiguation may bring
out problems like singularity and discontinuity in Sect. 3.5.6 (Zhang et al. 2020c; Fan et al.
2020; Gandikota et al. 2021), while considering all combinations would increase the com-
putational burden (Xiao et al. 2020; Kasaei 2021). Besides, PCA-based methods are fragile
to inputs with close eigenvalues since their eigenvectors are numerically unstable, which is
an inherent problem of eigenvalue decomposition.
3.7 Summary
In a word, different methods use distinctive ways to obtain rotation invariance. Most rota-
tion invariant methods are applied in 3D general understanding. We compare their differ-
ences in Table 7. Considering this, we summarize several characteristics of existing rota-
tion invariant methods.
• Data augmentation is always integrated with other methods, especially weakly rotation
invariant ones (Fang et al. 2020; Deng et al. 2021b; Le 2021), to improve their invari-
ance.
• Multi-view methods only work with images and do not have advantages on coordinate
inputs, since they are weakly invariant and usually introduce heavy computational bur-
dens (Su et al. 2015; Qi et al. 2016; Zhang et al. 2018).
• Ringlike and cylindrical methods are the best choices in tasks like place recognition
(Sun et al. 2019a; Li et al. 2022b), as achieving 2D invariance is simpler than 3D.
13
Rotation invariance and equivariance in 3D deep learning: a… Page 19 of 52 168
• Weakly rotation invariant transformation methods are less recommended. They can
be replaced by PCA-based methods that have strong invariance and excellent perfor-
mances.
• Until now, strong invariance is only available by applying invariant value methods and
PCA-based methods on coordinate inputs like point clouds and meshes.
Most of the rotation equivariant methods are equivariant networks on rotation groups. There
are already surveys on geometrically equivariant graph neural networks (Han et al. 2022;
Zhang et al. 2023b), categorizing them according to the way of message passing and aggrega-
tion. We devise a slightly different taxonomy to cover more related methods. Some milestone
methods are listed in Fig. 7.
4.1 G‑CNNs
Group equivariant convolutional neural networks (G-CNNs) are first proposed to address 2D
rotations in images (Cohen and Welling 2016). Moreover, they can be extended to 3D rota-
tions directly. The group convolution for 𝜓, f ∶ X → ℝ is defined as
[ ] [ ]
∫X (19)
𝜓 ⋆ f (g) = Lg 𝜓 (x)f (x)dx,
[ ] ( )
where Lg 𝜓 (x) = 𝜓 g−1 ⋅ x . The output signal is always defined on the rotation group, so
X = G in all convolutional[ layers] except[ the]first one. Group convolutions are strongly rota-
tion equivariant, i.e., 𝜓 ⋆ Lg f = Lg 𝜓 ⋆ f .
It is difficult to evaluate the integration directly, so many methods investigate group con-
volutions with finite groups. CubeNet (Worrall and Brostow 2018) focuses on convolutions
on finite groups and reduces rotation equivariance to permutation equivariance. The group
convolution for 𝜓, f ∶ G ̂ → ℝ satisfies
[ ]( ) [ ]( ) [ ]( )
𝜓 ⋆ f ĝ j = Lĝ i 𝜓 ⋆ f ĝ k(i,j) = 𝜓 ⋆ Lĝ i f ĝ k(i,j) , (20)
13
168 Page 20 of 52 J. Fei, Z. Deng
where G ̂ is a finite rotation group and ĝ k(i,j) = ĝ i ĝ j . Therefore, rotation f → Lĝ f is equiv-
i
alent to permutation j → k(i, j) in the group convolution, as Fig. 8 shows. Esteves et al.
(2019b) put multi-view features on vertices of the icosahedron and introduce localized
filters in discrete G-CNNs for efficiency. EPN (Chen et al. 2021) combines point convo-
lutions with group convolutions for SE(3) equivariance, and has been applied on object
detection (Yu et al. 2022) and place recognition (Lin et al. 2022a, 2023a). G-CNNs are
employed in many tasks, like medical image analysis (Winkels and Cohen 2018, 2019;
Andrearczyk and Depeursinge 2018), point cloud segmentation (Meng et al. 2019; Zhu
et al. 2023), pose estimation (Li et al. 2021d), and registration (Wang et al. 2022a, 2023a;
Xu et al. 2023a).
Some methods utilize Lie groups to construct equivariant models. LieConv (Finzi et al.
2020) lifts raw inputs x ∈ X to group elements g ∈ G and orbits q ∈ X∕G that g ⋅ oq = x,
where oq is the origin of q. Thus, the convolution is defined as
[ ] ( ) ( )
∫G ∫X∕G
𝜓 ⋆ f (g, q) = 𝜓 g−1 g� , q, q� f g� , q� dq� d𝜇(g)� , (21)
Fig. 9 A pipeline of Spherical CNNs. Most works (Cohen et al. 2018a; Esteves et al. 2018a) employ ten-
sor products to compute spherical/SO(3) convolutions in the spectral domain, while others directly apply
spherical convolutions in the spatial domain
13
Rotation invariance and equivariance in 3D deep learning: a… Page 21 of 52 168
4.2 Spherical CNNs
Spherical CNNs are special cases of G-CNNs, where the inputs are spherical and SO(3) sig-
nals. In this survey, existing spherical CNNs are divided into three categories, i.e., Cohen et al.
(2018a), Esteves et al. (2018a), and the others (Fig. 9).
Cohen et al. (2018a) directly employ group convolutions in Eq. 19, where X is either S2 or
SO(3). They use the generalized Fourier transform (GFT) to convert convolutions into matrix
multiplications. GFT and its inverse are computed as
∞ l
∑ ∑
∫S 2
f̃ml = f (x)Yml (x)dx, f (x) = f̃ml Yml (x), (22)
l=0 m=−l
∞ l
∑ ∑
∫SO(3)
f̃mn
l
= f (g)Dlmn (g)d𝜇(g), f (g) = f̃mn
l
Dlmn (g), (23)
l=0 m,n=−l
𝜓 ⋆1 f (x) = 𝜓 ⋆ f (𝜁 (x, 0)), where 𝜁 ∶ S × [0, 2𝜋) → SO(3). As 𝜁 (x, 0) cannot represent
all SO(3) elements, a3SConv is only equivariant to specific rotations. Esteves et al. (2020b)
introduce spin weights and propose the spin-weighted spherical CNN. PRIN (You et al.
2020, 2021) propose spherical
[ voxel
] convolution
[ ] (SVC) for signals on the unit ball B3.
SVC (⋆2) is defined as 𝜓 ⋆2 f (x) = 𝜓 ⋆ f (𝜄(x)), where 𝜄 ∶ B3 → SO(3). SPRIN { }(You
et al. 2021) abandons the dense grids in PRIN by directly converting point clouds xi into
∑ � �
a distribution function f (x) = N1 i 𝛿 x − xi , where 𝛿 is the delta function. Then SVC can
be efficiently approximated as an unbiased estimation. Chen et al. (2023) combines spheri-
cal CNNs with Capsule Networks (Hinton et al. 2011) for unknown pose recognition.
Most methods use the ray casting to generate spherical signals from 3D shapes. How-
ever, other methods are also applicable. Yang et al. (2019); Yang and Chakraborty (2020)
generate spherical signals by collecting responses from point clouds. Spherical-GMM
(Zhang 2021) represents point clouds with Gaussian mixture models. Besides classi-
fication and segmentation, spherical CNNs are widely used in many tasks, including
13
168 Page 22 of 52 J. Fei, Z. Deng
omnidirectional localization (Zhang et al. 2021a), place recognition (Yin et al. 2020, 2021,
2022), and self-supervised representation learning (Spezialetti et al. 2019; Marcon et al.
2021; Lohit and Trivedi 2020; Spezialetti et al. 2020).
Esteves et al. (2018a, 2020a) propose another spherical convolution only processing spher-
ical signals. The spherical convolution for 𝜓, f ∶ S2 → ℝ is defined as
[ ]
∫G (25)
𝜓 ∗ f (x) = Lg 𝜓(x)Lg−1 f (𝜂)d𝜇(g),
where 𝜂 is the north pole. Such spherical convolutions are strongly rotation equivariant,
[ ] [ ]
i.e., 𝜓 ∗ Lg f = Lg 𝜓 ∗ f , which can be converted to multiplications with GFT as
l
√
𝜓� ∗ f m = 2𝜋 2l+1 4𝜋
𝜓̃ 0l f̃ml . As only 𝜓̃ 0l is involved, the only useful part is the zonal compo-
nent of the filter 𝜓 .
Esteves et al. (2019a) utilize pre-trained spherical CNNs as supervision and learn equiv-
ariant representations for 2D images. Mukhaimar et al. (2022) apply them on concentric
spherical voxels for robust point cloud classification. Esteves et al. (2023) scale up spheri-
cal CNNs and achieve outstanding performances on molecular benchmarks and weather
forecasting tasks.
4.2.3 Others
Some spherical CNNs keep GFT and part of spherical convolutions. Zhang et al. (2019a)
replace the SO(3) convolutional layers with PointNet-like (Qi et al. 2017a) networks.
Almakady et al. (2020) use GFT to decompose the spherical signals, then exploit the
norms of individual components as invariant features for volumetric texture classification.
Lin et al. (2021b) combine these norms with other invariant features to boost the classifica-
tion performance.
Some spherical CNNs handle convolutions in the spatial domain. SFCNN (Rao et al.
2019) apply symmetric convolutions to each point and its neighbors on spherical lattices.
Yang et al. (2020) propose the geodesic icosahedral pixelization to address the irregular-
ity problem. Fox et al. (2022) transform point clouds into concentric spherical signals and
append convolutions along the radial dimension. Shakerinava and Ravanbakhsh (2021)
investigate the pixelizations of platonic solids for spheres and introduce equivariant maps
on them. Xu et al. (2022) exploit global–local attention-based convolutions for spherical
data.
4.2.4 Discussion
Spherical CNNs are effective for spherical signals. They have a solid mathematical foun-
dation and nice properties on equivariance. Notwithstanding, preprocessing is sometimes
problematic. The ray casting technique is commonly adopted to convert 3D shapes into
spherical signals (Cohen et al. 2018a; Esteves et al. 2018a). However, Esteves et al. (2018a)
argue that it is only suitable for star-shaped objects, from whose interior point the whole
13
Rotation invariance and equivariance in 3D deep learning: a… Page 23 of 52 168
Fig. 10 TFN (Thomas et al. 2018; Thomas 2019) layer. Each point xi is associated with a tensor field V i .
The output tensor field V ′i is aggregated from the tensor product between the filter features F(xi − xj ) and
the input tensor field V j . Some superscripts and subscripts are omitted for simplicity
boundary is visible. Besides, projection on spheres would unavoidably distort shapes, and
finer grids lead to less error but a heavier computational burden (Cohen et al. 2018a; Este-
ves et al. 2018a).
where 𝜑l ∶ ℝ≥0 → ℝ and Y l is the spherical harmonic. To guarantee the continuity, Fl (0)
is determined by limx→0 F l (x), which is nonzero only when l = 0. Fl is strongly rotation
equivariant, i.e., Fl (R(g)x) = Dl (g)Fl (x).
Irreducible representation methods are mostly applied to coordinate inputs like point
clouds. Tensor field networks (TFNs) (Thomas et al. 2018; Thomas 2019) are the pioneer-
ing methods using irreducible representations. All inputs and outputs of the TFN layer are
̃ l ∈ ℝN×Cl ×(2 l+1), where N is the number of points, Cl is the feature dimen-
tensor fields V
sion, and l = 0, ⋯ , L is the rotation degree. They exploit TFN filters to generate steerable
features from coordinates. Then the tensor product between these features and input tensor
fields is computed as the output tensor fields, as shown in Fig. 10. TFNs and Clebsch-
Gordan Nets (Kondor et al. 2018) have many similarities, including steerable features and
tensor products. However, TFNs bind steerable features with points, while Clebsch-Gordan
Nets exploit steerable features to describe spherical signals. N-body Networks (Kondor
2018), designed for many body physical systems, are also based on the irreducible repre-
sentation of SO(3). Cormorant (Anderson et al. 2019) modifies the nonlinearity in Cleb-
sch-Gordan Nets (Kondor et al. 2018) to avoid the blow-up of channels. SE(3)-Transformer
(Fuchs et al. 2020) decomposes the TFN layer into self-interaction and message-passing,
where attention is added to the second part. TF-Onet Chatzipantazis et al. (2023) also
uses equivariant attention modules for shape reconstruction. Poulenard and Guibas (2021)
propose a new nonlinearity for steerable features to improve the expressivity and reduce
the computational burden. TFNs are leveraged in many applications, including 3D shape
analysis (Poulenard et al. 2019), protein structure prediction (Fuchs et al. 2021), molecular
13
168 Page 24 of 52 J. Fei, Z. Deng
satisfy
(27)
� � �
W ll (R(g)x) = Dl (g)W ll (x)Dl (g)−1 .
Eq. 27 can be solved analytically with the solution as a TFN-type matrix function.
3D Steerable CNNs are employed in some applications, including 3D texture analysis
(Andrearczyk et al. 2019), partial point cloud classification (Xu et al. 2023b), and mul-
tiphase flow demonstration (Siddani et al. 2021; Lin et al. 2021a). PDO-s3DCNNs (Shen
et al. 2022) derive the general steerable 3D CNNs with partial differential operators.
Irreducible representation methods have intrinsic strong rotation equivariance. Nonethe-
less, the theory is so complex as to limit the audience (Weiler et al. 2018; Thomas et al.
2018; Thomas 2019). Besides, tensor products may increase the number of the rotation
degree and harm the efficiency Thomas et al. (2018); Thomas (2019); Kondor et al. (2018).
Equivariant value methods are networks constructed by equivariant values, i.e., scalars and
vectors. They are similar to invariant value methods in Sect. 3.5. However, invariant values
are only primitive features, while equivariant values form the basic blocks of equivariant
networks.
EGNNs (Satorras et al. 2021b) add relative distances to graph convolutional layers.
Then the coordinate xi and feature f i are updated as
( )
‖ ‖2
mij = 𝜙e f i , f j , ‖xi − xj ‖ , aij ,
‖ ‖ (28)
⎛ ⎞
1 �� � � � �
xi + xi − xj 𝜙x mij → xi , 𝜙f ⎜f i , mij ⎟ → f i , (29)
N − 1 j≠i ⎜ ⎟
⎝ j∈N(xi ) ⎠
where aij is the edge information, 𝜙e , 𝜙x , 𝜙f are update functions for edges, coordinates,
and node features, respectively. Clearly, the coordinates are strongly rotation equivariant,
while the features are strongly rotation invariant. E-NFs (Satorras et al. 2021a) combine
EGNNs with continuous-time normalizing flows (Chen et al. 2018a) to construct equiv-
ariant generative models. EquiDock (Ganea et al. 2022) and EquiBind (Stärk et al. 2022)
apply graph matching networks (Li et al. 2019b) and EGNNs on rigid body protein-protein
docking and drug binding structure prediction, respectively. Some methods (Hoogeboom
et al. 2022; Schneuing et al. 2022; Igashov et al. 2022; Lin et al. 2022b; Guan et al. 2023)
incorporate diffusion models with EGNNs for molecule generation. SEGNNs (Brandstetter
et al. 2022) extend EGNNs with steerable features.
Vector Neurons (VNs) (Deng et al. 2021a) endow networks with equivariance by replac-
ing scalars with vectors. Take the linear layer as an example, v ∈ ℝC is transformed into
13
Rotation invariance and equivariance in 3D deep learning: a… Page 25 of 52 168
Fig. 11 The comparison between linear layers in typical networks (left) and VNs (right) (Deng et al.
2021a). Each solid line represents a weight value. As the vectors are transformed consistently, VNs can
achieve strong rotation equivariance
former (Assaad et al. 2022) derives equivariant attention mechanisms to enhance effective-
ness and efficiency based on VNs. VNs are strongly rotation equivariant and have been
applied in object manipulation (Simeonov et al. 2022), molecule generation (Huang et al.
2022b), point cloud registration (Zhu et al. 2022b; Lin et al. 2023b; Ao et al. 2023b), point
cloud completion (Wu and Miao 2022), unsupervised point cloud segmentation (Lei et al.
2023), and point cloud canonicalization (Katzir et al. 2022; Kaba et al. 2023). Geomet-
ric vector perceptrons (GVPs) (Jing et al. 2021b) similarly operate on geometric vectors.
Jing et al. (2021a) apply GVPs on structural biology tasks and reach several state-of-the-art
results. PaiNN (Schütt et al. 2021) builds efficient equivariant layers to predict molecular
properties. SE(3)-DDM (Liu et al. 2022b) applies PaiNN on the coordinate denoising task.
TorchMD-NET (Thölke and Fabritiis 2022) designs attention-based update rules for fea-
tures of different types. Directed weight neural networks (Li et al. 2022a) generalize VNs
and GVPs with more operators, which can be integrated with existing GNN frameworks.
Chen et al. (2022) build graph implicit functions with equivariant layers to capture geomet-
ric details. Le et al. (2022b) exploit cross products to generate new vectors in the message
function.
Villar et al. (2021) utilize several theorems to construct equivariant functions on groups
including O(n) and SO(n). GMN (Huang et al. 2022a) constructs equivariant networks
similarly and proves their universal approximation. IsoGCNs (Horie et al. 2021) achieve
equivariance through operating rank-p tensors H p ∈ ℝ|V|×C×d . Using a similar approach,
p
Finkelshtein et al. (2022) define ascending and descending layers for geometric dimension
expansion and contraction, respectively. Suk et al. (2021, 2022) leverage equivariant neural
networks in computational fluid dynamics. EQGAT (Le et al. 2022a) processes coordinates
with attention mechanisms for better performances. Luo et al. (2022) extend message pass-
ing networks with learned orientations. DeepDFT (Jørgensen and Bhowmik 2022) employs
message passing networks on fast electron density estimation.
Compared to previous methods, equivariant value methods do not introduce approxima-
tion error and their theories are relatively simple (Satorras et al. 2021b; Deng et al. 2021a).
Albeit recently emerged, they have shown great potential in many applications (Deng et al.
2021a; Ganea et al. 2022; Stärk et al. 2022; Schütt et al. 2021).
13
168
13
Page 26 of 52
G-CNNs Voxels, point clouds, graphs Weak Approximation error of integration Problems of finite subgroups
Spherical CNNs Spherical signals Weak Approximation error of GFT Problems of preprocessing
Irreducible representation methods Voxels, point clouds, graphs Strong Complex theory Inefficient tensor products
Equivariant value methods Point clouds, graphs Strong No common weakness
J. Fei, Z. Deng
Rotation invariance and equivariance in 3D deep learning: a… Page 27 of 52 168
4.5 Others
4.6 Summary
Rotation equivariant methods have a broader application range compared to rotation invari-
ant ones. The differences of various rotation equivariant methods are listed in Table 8. We
summarize several characteristics of existing rotation equivariant methods.
• The approximation error of G-CNNs (Finzi et al. 2020; Chen et al. 2021) and spheri-
cal CNNs (Cohen et al. 2018a; Esteves et al. 2018a) are inevitable, which can only be
reduced through fine discretization and cumbersome computation. Therefore, they are
less reliable than strongly rotation equivariant methods.
• Albeit strongly rotation equivariant, irreducible representation methods (Thomas et al.
2018; Weiler et al. 2018; Thomas 2019) have a complex theory, which poses great chal-
lenges for fresh users.
• Equivariant value methods (Satorras et al. 2021b; Deng et al. 2021a) achieve great bal-
ance between theoretical properties and experimental performances.
13
168 Page 28 of 52 J. Fei, Z. Deng
13
Rotation invariance and equivariance in 3D deep learning: a… Page 29 of 52 168
Table 10 ModelNet40 (Wu et al. 2015) classification results of representative rotation invariant/equivariant
methods
Method Input Overall accuracy (%)
Type Size z/z SO(3)/SO(3) z/SO(3)
pc = point cloud, n = normal. ∗ : Some works replace azimuthal rotation augmentation (z) with no augmen-
tation (I).1: Results are from Esteves et al. (2018a). b: Results are from Deng et al. (2021a)
Rotation invariance and equivariance are seldom separate problems and always depend
on task requirements in specific settings. We give a general overview of applications and
datasets involved in related works and divide them into 3D semantic understanding and
13
168 Page 30 of 52 J. Fei, Z. Deng
Table 11 ScanObjectNN (Uy et al. 2019) classification results of representative rotation invariant/equivari-
ant methods
Method Overall Accuracy (%)
All methods take 2,048 points as input. ∗ : Some works replace azimuthal rotation augmentation (z) with no
augmentation (I).a: Results are from Zhang et al. (2022)
molecule-related applications.
13
Rotation invariance and equivariance in 3D deep learning: a… Page 31 of 52 168
Table 12 ShapeNetPart (Yi et al. 2016) segmentation results of representative rotation invariant/equivariant
methods
Method Normal z/z SO(3)/SO(3) z/SO(3)
ins cls ins cls ins cls
All methods take 2048 points as input, while some employ normals as additional inputs. : Some works
∗a
in Table 11. Fewer works explore ScanObjectNN compared to ModelNet40 (Wu et al.
2015). Besides, there is no consensus on which variant to evaluate and still has much
room for improvement. Other datasets are used less frequently, like RGB-D Object (Lai
et al. 2011), S3DIS (Armeni et al. 2016), and ScanNet (Dai et al. 2017). Some meth-
ods, especially those processing spherical signals, use Spherical MNIST (Cohen et al.
2018a) to evaluate their performances. Yang et al. (2020) create Spherical CIFAR-10 to
experiment on photorealistic images. Andrearczyk and Depeursinge (2018); Almakady
et al. (2020) exploit RFAI (Paulhac et al. 2009) on 3D texture classification. Yang
and Chakraborty (2020) employ the OASIS (Fotenos et al. 2005) for medical image
classification.
Segmentation. Segmentation is another popular task, aiming to make fine-grained
prediction. In part segmentation for small-scale objects, ShapeNetPart (Yi et al. 2016) is
widely applied as the evaluation dataset, where two common metrics, i.e., instance mean
IoU (ins.) and class mean IoU (cls.) are generally used. As shown in Table 12, RIConv++
(Zhang et al. 2022) and PaRI-Conv (Chen and Cong 2022) set the state-of-the-art results
in class mean IoU and instance mean IoU, respectively. However, we also notice that the
13
168 Page 32 of 52 J. Fei, Z. Deng
13
Table 14 QM9 (Ramakrishnan et al. 2014) prediction mean absolute error of representative rotation invariant/equivariant methods
Property 𝛼 Δ𝜖 𝜖HOMO 𝜖LUMO 𝜇 Cv G H R2 U U0 ZPVE
Unit bohr 3 meV meV meV D cal/mol K meV meV 2 meV meV meV
bohr
SE(3)-Transformer (Fuchs et al. 2020) .142 53.0 33.0 35.0 .051 .054 – – – – – –
PaiNN (Schütt et al. 2021) .045 45.7 27.6 20.4 .012 .024 7.35 5.98 .066 5.83 5.85 1.28
LieTransformer (Hutchinson et al. 2021) .082 51 33 27 .041 .035 19 17 .448 16 17 2.10
EGNN (Satorras et al. 2021b) .071 48 29 25 .029 .031 12 12 .106 12 11 1.55
SEGNN (Brandstetter et al. 2022) .060 42 24 21 .023 .031 15 16 .660 13 15 1.62
TorchMD-NET (Thölke and Fabritiis 2022) .059 36.1 20.3 17.5 .011 .026 7.62 6.16 .033 6.38 6.15 1.84
Esteves et al. (2023) .049 28.8 21.6 18.0 .016 .022 6.54 5.69 .027 5.72 5.65 1.15
Page 33 of 52 168
13
168 Page 34 of 52 J. Fei, Z. Deng
driving, where the odometry benchmark is generally adopted to evaluate the place rec-
ognition performance. Many datasets are also leveraged for a comprehensive evaluation,
including ETH (Pomerleau et al. 2012), NCLT (Carlevaris-Bianco et al. 2016), SceneCity
(Zhang et al. 2016), Oxford RobotCar (Maddern et al. 2017), MulRan (Kim et al. 2020a),
KITTI-360 (Liao et al. 2022).
Reconstruction. Reconstruction is a pre-training task adopted by many self-supervised
methods. Much work (Shen et al. 2020; Deng et al. 2021a; Sun et al. 2021; Zhou et al.
2022b) carries out the reconstruction experiment on ShapeNetCore (Chang et al. 2015). In
addition, Yu et al. (2020b) utilize ModelNet40 (Wu et al. 2015) for point cloud inpainting
and completion.
Retrieval. Retrieval is the task of finding similar objects to the query object. SHREC’ 17
(Savva et al. 2017) is a famous retrieval challenge based on the ShapeNetCore (Chang et al.
2015). Some methods (Su et al. 2015; Esteves et al. 2019b; Wei et al. 2020) also experi-
ment on ModelNet (Wu et al. 2015).
Others. Ke et al. (2017) use the NTU RGB+D (Shahroudy et al. 2016), the SBU kinect
interaction (Yun et al. 2012), and the CMU dataset (CMU 2002) for skeleton action rec-
ognition. Qin et al. (2022) apply FPHA (Garcia-Hernando et al. 2018) on hand action rec-
ognition. Besides, some methods (Liu et al. 2019b; Zhang et al. 2020c; Yang et al. 2021)
exploit ModelNet40 (Wu et al. 2015) on normal estimation. Esteves et al. (2023) employ
the WeatherBench (Rasp et al. 2020) to evaluate large spherical CNNs (Esteves et al.
2018a) on weather forecasting.
5.2 Molecule‑related application
Recently, the number of papers that employ rotation equivariant networks on molecular
data grows explosively. The physical and chemical laws determine the relative but not
absolute positions of atoms. Therefore, rotation invariance and equivariance are inherently
needed in molecule-related applications. As related work goes further, many new tasks
are investigated, and we only summarize some representative ones. Tasks and datasets are
listed in Table 13.
Prediction Prediction is to predict molecular properties giving molecular structures.
QM7 (Blum and Reymond 2009; Rupp et al. 2012) is a small and pioneering dataset used
by some works (Liu et al. 2022c; Kondor et al. 2018). QM9 (Ramakrishnan et al. 2014) is
a commonly-used dataset, including 134k molecules with geometric, energetic, electronic,
and thermodynamic properties. As shown in Tab. 14, there are more rotation equivari-
ant methods than rotation invariant ones in this prediction task. As related research dives
further, novel methods with powerful and sophisticated structures show great potential in
decreasing the mean absolute error of molecular property prediction. ATOM3D (Town-
shend et al. 2021) is a set of benchmarks including various tasks. Other datasets, including
MD17 (Chmiela et al. 2017), ISO17 (Schütt et al. 2017), ESOL (Delaney 2004), BACE
(Subramanian et al. 2016), PDB (Berman et al. 2003), and OC20 (Zitnick et al. 2020), are
also applied in different prediction tasks.
Generation In generation, the model is required to generate molecules according to
certain requirements. Thomas et al. (2018) employ random deletion on QM9 (Ram-
akrishnan et al. 2014) and validate the model with an inpainting task. Jing et al. (2021b);
Li et al. (2022a) exploit CATH 4.2 (Ingraham et al. 2019) and TS50 (Li et al. 2014) on
computational protein design. Du et al. (2021) employ subsets of GEOM (Axelrod and
13
Rotation invariance and equivariance in 3D deep learning: a… Page 35 of 52 168
6 Future direction
Here we point out several future research directions inspired from unsolved problems in the
presence of methods and tasks.
6.1 Method
The pros and cons of existing methods have been summarized in Sects. 3 and 4. There-
fore, the future method should perform better and avoid any previous drawbacks by pos-
sessing the following properties.
• Strong rotation invariance and equivariance. This survey includes weakly invariant
and equivariant methods in discussing rotation invariance and equivariance for the
first time. Nonetheless, we argue only to use these methods if necessary. They involve
redundant uncertainties and cannot deliver consistent results for the same inputs with
different poses.
• Concise mathematical background. The theory of many existing methods is too verbose
and complicated. It should be simplified, especially when they have little connection
with the implementation. Any novel method should avoid exploring general but unre-
lated theories.
• High computational efficiency. Due to the high latency, many well-performed methods
cannot be employed in practical applications. As the research progresses to large-scale
and complex data, the latest work should consider such application scenarios and be as
efficient as possible.
• Reliable integrability. Many successful DNNs have been developed for numerous appli-
cations where rotation invariance and equivariance are not considered. Therefore, they
are only suitable for aligned data. If lately-developed methods can be integrated with
these models straightforwardly, then the composite models would benefit from both.
6.2 Theoretical analysis
Most of the existing theoretical analysis addresses strong invariance and equivariance.
Some methods propose mathematical frameworks to construct equivariant networks (Kon-
dor and Trivedi 2018; Cohen et al. 2018b, 2019b; Esteves 2020; Aronsson 2021; Gerken
et al. 2021; Winter et al. 2022). However, the discussion on universal approximation is
quite limited (Dym and Maron 2021), and most equivariant networks do not have solid
mathematical foundations.
13
168 Page 36 of 52 J. Fei, Z. Deng
6.3 Benchmark
The research on rotation invariance and equivariance is still immature and lacks reliable
and comprehensive benchmarks. Except for some well-studied tasks, most applications
have yet to be intensively investigated. The evaluation metric (Eq. 3) has yet to be com-
monly adopted, especially for weakly rotation invariant and equivariant methods. Existing
metrics cannot reflect the strength of invariance and equivariance.
7 Conclusion
Author contributions Jiajun Fei wrote the main manuscript text. Zhidong Deng served as the scientific advi-
sor and led this research project. Deng chose this topic and provided valuable suggestions throughout the
whole research process. After the initial draft was finished, Deng made many thorough and comprehensive
revisions to correct the mistakes and improve the readability. Both authors reviewed the manuscript.
Funding This work was supported in part by the National Science Foundation of China (NSFC) under
Grant No. 62176134. The authors have no relevant financial or non-financial interests to disclose.
Declarations
Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-
mons licence, and indicate if changes were made. The images or other third party material in this article
are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Almakady Y, Mahmoodi S, Conway J et al (2020) Rotation invariant features based on three dimensional
Gaussian Markov random fields for volumetric texture classification. Comput Vis Image Underst
194(102):931. https://doi.org/10.1016/j.cviu.2020.102931
13
Rotation invariance and equivariance in 3D deep learning: a… Page 37 of 52 168
Anderson B, Hy TS, Kondor R (2019) Cormorant: covariant molecular neural networks. In: Advances in
neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Andrearczyk V, Depeursinge A (2018) Rotational 3d texture classification using group equivariant cnns.
arXiv preprint arXiv:1810.06889
Andrearczyk V, Fageot J, Oreiller V, et al (2019) Exploring local rotation invariance in 3d cnns with steer-
able filters. In: Proceedings of The 2nd international conference on medical imaging with deep learn-
ing, proceedings of machine learning research, vol 102. PMLR, pp 15–26
Andrearczyk V, Fageot J, Oreiller V et al (2020) Local rotation invariance in 3d cnns. Med Image Anal
65(101):756. https://doi.org/10.1016/j.media.2020.101756
Ao S, Hu Q, Yang B, et al (2021) Spinnet: learning a general surface descriptor for 3d point cloud registra-
tion. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,748–
11,757, https://doi.org/10.1109/CVPR46437.2021.01158
Ao S, Guo Y, Hu Q et al (2023) You only train once: learning general and distinctive 3d local descriptors.
IEEE Trans Pattern Anal Mach Intell 45(3):3949–3967. https://doi.org/10.1109/TPAMI.2022.31803
41
Ao S, Hu Q, Wang H, et al (2023b) Buffer: balancing accuracy, efficiency, and generalizability in point
cloud registration. In: 2023 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 1255–1264, https://doi.org/10.1109/CVPR52729.2023.00127
Armeni I, Sener O, Zamir AR, et al (2016) 3d semantic parsing of large-scale indoor spaces. In: 2016 IEEE
conference on computer vision and pattern recognition (CVPR), pp 1534–1543, https://doi.org/10.
1109/CVPR.2016.170
Armeni I, Sax S, Zamir AR, et al (2017) Joint 2d-3d-semantic data for indoor scene understanding. https://
doi.org/10.48550/ARXIV.1702.01105
Aronsson J (2021) Homogeneous vector bundles and g-equivariant convolutional neural networks. PhD the-
sis, Chalmers Tekniska Hogskola
Artin M (2013) Algebra. Pearson Education, London
Assaad S, Downey C, Al-Rfou’ R, et al (2022) VN-transformer: rotation-equivariant attention for vector
neurons. arXiv:2206.04176
Axelrod S, Gómez-Bombarelli R (2022) Geom, energy-annotated molecular conformations for property
prediction and molecular generation. Sci Data 9(1):185. https://doi.org/10.1038/s41597-022-01288-4
Azari B, Erdogmus D (2022) Equivariant deep dynamical model for motion prediction. In: Proceedings
of The 25th international conference on artificial intelligence and statistics, proceedings of machine
learning research, vol 151. PMLR, pp 11,655–11,668
Bai X, Luo Z, Zhou L, et al (2020) D3feat: joint learning of dense detection and description of 3d local
features. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp
6358–6366, https://doi.org/10.1109/CVPR42600.2020.00639
Batzner S, Musaelian A, Sun L et al (2022) E(3)-equivariant graph neural networks for data-effi-
cient and accurate interatomic potentials. Nat Commun 13(1):2453. https://doi.org/10.1038/
s41467-022-29939-5
Bergmann P, Sattlegger D (2023) Anomaly detection in 3d point clouds using deep geometric descriptors.
In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp 2612–2622,
https://doi.org/10.1109/WACV56688.2023.00264
Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide protein data bank. Nat Struct Mol
Biol 10(12):980. https://doi.org/10.1038/nsb1203-980
Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28(1):235–242.
https://doi.org/10.1093/nar/28.1.235
Blum LC, Reymond JL (2009) 970 million druglike small molecules for virtual screening in the chemical
universe database GDB-13. J Am Chem Soc 131(25):8732–8733. https://doi.org/10.1021/ja902302h
Bobkov D, Chen S, Jian R et al (2018) Noise-resistant deep learning for object classification in three-dimen-
sional point clouds using a point pair descriptor. IEEE Robotics and Automation Letters 3(2):865–
872. https://doi.org/10.1109/LRA.2018.2792681
Bogo F, Romero J, Loper M, et al (2014) Faust: dataset and evaluation for 3d mesh registration. In: 2014
IEEE conference on computer vision and pattern recognition, pp 3794–3801, https://doi.org/10.1109/
CVPR.2014.491
Brandstetter J, Hesselink R, van der Pol E, et al (2022) Geometric and physical quantities improve e(3)
equivariant message passing. In: International conference on learning representations (ICLR)
Bronstein AM, Bronstein MM, Kimmel R (2008) Numerical geometry of non-rigid shapes. Springer, Berlin
Caesar H, Bankiti V, Lang AH, et al (2020) nuscenes: a multimodal dataset for autonomous driving. In:
2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,618–11,628,
https://doi.org/10.1109/CVPR42600.2020.01164
13
168 Page 38 of 52 J. Fei, Z. Deng
Cao H, Zhan R, Ma Y et al (2021) Lfnet: local rotation invariant coordinate frame for robust point cloud
analysis. IEEE Signal Process Lett 28:209–213. https://doi.org/10.1109/LSP.2020.3048605
Cao Z, Huang Q, Karthik R (2017) 3d object classification via spherical projections. In: 2017 international
conference on 3D vision (3DV), pp 566–574, https://doi.org/10.1109/3DV.2017.00070
Carlevaris-Bianco N, Ushani AK, Eustice RM (2016) University of Michigan North campus long-term
vision and lidar dataset. Int J Robot Res 35(9):1023–1035. https://doi.org/10.1177/0278364915
614638
Chang AX, Funkhouser T, Guibas L, et al (2015) Shapenet: an information-rich 3d model repository. arXiv
preprint arXiv:1512.03012
Chatzipantazis E, Pertigkiozoglou S, Dobriban E, et al (2023) SE(3)-equivariant attention networks for
shape reconstruction in function space. In: International conference on learning representations
(ICLR)
Chen C, Li C, Chen L, et al (2018a) Continuous-time flows for efficient inference and density estimation.
In: Proceedings of the 35th International conference on machine learning (ICML), proceedings of
machine learning research, vol 80. PMLR, pp 824–833
Chen C, Fragonara LZ, Tsourdos A (2019a) Gapnet: graph attention based point neural network for exploit-
ing local feature of point cloud. https://doi.org/10.48550/ARXIV.1905.08705
Chen C, Li G, Xu R, et al (2019b) Clusternet: deep hierarchical cluster network with rigorously rotation-
invariant representation for point cloud analysis. In: 2019 IEEE/CVF conference on computer vision
and pattern recognition (CVPR), pp 4989–4997, https://doi.org/10.1109/CVPR.2019.00513
Chen H, Liu S, Chen W, et al (2021) Equivariant point network for 3d point cloud analysis. In: 2021 IEEE/
CVF conference on computer vision and pattern recognition (CVPR), pp 14,509–14,518, https://doi.
org/10.1109/CVPR46437.2021.01428
Chen H, Zhao J, Zhang Q (2023) Rotation-equivariant spherical vector networks for objects recognition
with unknown poses. Vis Comput. https://doi.org/10.1007/s00371-023-02904-z
Chen Q, Chen Y (2022) Multi-view 3d model retrieval based on enhanced detail features with contrastive
center loss. Multimed Tools Appl. https://doi.org/10.1007/s11042-022-12281-9
Chen R, Cong Y (2022) The devil is in the pose: ambiguity-free 3d rotation-invariant learning via pose-
aware convolution. In: 2022 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 7462–7471, https://doi.org/10.1109/CVPR52688.2022.00732
Chen X, Wang G, Zhang C et al (2018) Shpr-net: deep semantic hand pose regression from point clouds.
IEEE Access 6:43425–43439. https://doi.org/10.1109/ACCESS.2018.2863540
Chen Y, Fernando B, Bilen H et al (2022) 3d equivariant graph implicit functions. Computer Vision–ECCV
2022. Springer, Cham, pp 485–502. https://doi.org/10.1007/978-3-031-20062-5_28
Cheng J, Choe MH, Elofsson A et al (2019) Estimation of model accuracy in casp13. Proteins Struct Funct
Bioinf 87(12):1361–1377. https://doi.org/10.1002/prot.25767
Chmiela S, Tkatchenko A, Sauceda HE et al (2017) Machine learning of accurate energy-conserving molec-
ular force fields. Sci Adv 3(5):e1603,015. https://doi.org/10.1126/sciadv.1603015
Chou YC, Lin YP, Yeh YM, et al (2021) 3d-gfe: a three-dimensional geometric-feature extractor for point
cloud data. In: 2021 Asia-Pacific Signal and information processing association annual summit and
conference (APSIPA ASC), pp 2013–2017
Choy C, Park J, Koltun V (2019) Fully convolutional geometric features. In: 2019 IEEE/CVF international
conference on computer vision (ICCV), pp 8957–8965, https://doi.org/10.1109/ICCV.2019.00905
CMU (2002) Cmu graphics lab motion capture database. http://mocap.cs.cmu.edu/
Cohen T, Welling M (2016) Group equivariant convolutional networks. In: Balcan MF, Weinberger KQ
(eds) Proceedings of The 33rd international conference on machine learning (ICML), proceedings
of machine learning research, vol 48. PMLR, New York, New York, USA, pp 2990–2999
Cohen T, Weiler M, Kicanaoglu B, et al (2019a) Gauge equivariant convolutional networks and the ico-
sahedral CNN. In: Proceedings of the 36th international conference on machine learning (ICML),
proceedings of machine learning research, vol 97. PMLR, pp 1321–1330
Cohen TS, Geiger M, Köhler J, et al (2018a) Spherical CNNS. In: International conference on learning
representations (ICLR)
Cohen TS, Geiger M, Weiler M (2018b) Intertwiners between induced representations (with applications
to the theory of equivariant neural networks). https://doi.org/10.48550/ARXIV.1803.10743
Cohen TS, Geiger M, Weiler M (2019b) A general theory of equivariant cnns on homogeneous spaces.
In: Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Cornwell JF (1997) Group theory in physics: an introduction. Academic press, San Diego
Curless B, Levoy M (1996) A volumetric method for building complex models from range images.
In: Proceedings of the 23rd annual conference on computer graphics and interactive techniques.
13
Rotation invariance and equivariance in 3D deep learning: a… Page 39 of 52 168
Association for computing machinery, New York, NY, USA, SIGGRAPH ’96, pp 303–312, https://
doi.org/10.1145/237170.237269
Dai A, Chang AX, Savva M, et al (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes.
In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2432–2443,
https://doi.org/10.1109/CVPR.2017.261
Delaney JS (2004) Esol: estimating aqueous solubility directly from molecular structure. J Chem Inf
Comput Sci 44(3):1000–1005. https://doi.org/10.1021/ci034243x
Deng C, Litany O, Duan Y, et al (2021a) Vector neurons: a general framework for so(3)-equivariant
networks. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 12,180–
12,189, https://doi.org/10.1109/ICCV48922.2021.01198
Deng H, Birdal T, Ilic S (2018) Ppf-foldnet: unsupervised learning of rotation invariant 3d local descrip-
tors. Computer VisionECCV 2018. Springer, Cham, pp 620–638. https://doi.org/10.1007/978-3-
030-01228-1_37
Deng H, Birdal T, Ilic S (2018b) Ppfnet: global context aware local features for robust 3d point match-
ing. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 195–205,
https://doi.org/10.1109/CVPR.2018.00028
Deng S, Liu B, Dong Q, et al (2021b) Rotation transformation network: learning view-invariant point
cloud for classification and segmentation. In: 2021 IEEE international conference on multimedia
and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME51207.2021.9428265
Drost B, Ulrich M, Navab N, et al (2010) Model globally, match locally: efficient and robust 3d object
recognition. In: 2010 IEEE computer society conference on computer vision and pattern recogni-
tion (CVPR), pp 998–1005, https://doi.org/10.1109/CVPR.2010.5540108
Du W, Zhang H, Du Y, et al (2021) Equivariant vector field network for many-body system modeling.
https://doi.org/10.48550/ARXIV.2110.14811
Dym N, Maron H (2021) On the universality of rotation equivariant point cloud networks. In: Interna-
tional conference on learning representations (ICLR)
Esteves C (2020) Theoretical aspects of group equivariant neural networks. arXiv preprint arXiv:2004.
05154
Esteves C, Allen-Blanchette C, Makadia A et al (2018) Learning so(3) equivariant representations with
spherical CNNS. Computer Vision–ECCV 2018. Springer International Publishing, Cham, pp
54–70. https://doi.org/10.1007/978-3-030-01261-8_4
Esteves C, Allen-Blanchette C, Zhou X, et al (2018b) Polar transformer networks. In: International con-
ference on learning representations (ICLR)
Esteves C, Sud A, Luo Z, et al (2019a) Cross-domain 3d equivariant image embeddings. In: Proceedings
of the 36th international conference on machine learning (ICML), Proceedings of machine learn-
ing research, vol 97. PMLR, pp 1812–1822
Esteves C, Xu Y, Allec-Blanchette C, et al (2019b) Equivariant multi-view networks. In: 2019 IEEE/
CVF international conference on computer vision (ICCV), pp 1568–1577, https://doi.org/10.1109/
ICCV.2019.00165
Esteves C, Allen-Blanchette C, Makadia A et al (2020) Learning so(3) equivariant representations with
spherical CNNS. Int J Comput Vis 128:588–600. https://doi.org/10.1007/s11263-019-01220-1
Esteves C, Makadia A, Daniilidis K (2020b) Spin-weighted spherical CNNS. In: Advances in neural
information processing systems (NeurIPS), vol 33. Curran Associates, Inc., pp 8614–8625
Esteves C, Slotine JJ, Makadia A (2023) Scaling spherical CNNs. In: Krause A, Brunskill E, Cho K,
et al (eds) Proceedings of the 40th international conference on machine learning (ICML), Proceed-
ings of machine learning research, vol 202. PMLR, pp 9396–9411
Fan S, Dong Q, Zhu F, et al (2021) Scf-net: learning spatial contextual features for large-scale point
cloud segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 14,499–14,508, https://doi.org/10.1109/CVPR46437.2021.01427
Fan Y, He Y, Tan UX (2020) Seed: a segmentation-based egocentric 3d point cloud descriptor for loop
closure detection. In: 2020 IEEE/RSJ international conference on intelligent robots and systems
(IROS), pp 5158–5163, https://doi.org/10.1109/IROS45743.2020.9341517
Fan Z, Song Z, Zhang W et al (2023) Rpr-net: a point cloud-based rotation-aware large scale place rec-
ognition network. Computer Vision–ECCV 2022 Workshops. Springer Nature Switzerland, Cham,
pp 709–725. https://doi.org/10.1007/978-3-031-25056-9_45
Fang J, Zhou D, Song X, et al (2020) Rotpredictor: unsupervised canonical viewpoint learning for point
cloud classification. In: 2020 international conference on 3D vision (3DV), pp 987–996, https://
doi.org/10.1109/3DV50981.2020.00109
Fei J, Deng Z (2024) Incorporating rotation invariance with non-invariant networks for point clouds. In:
2024 international conference on 3D vision (3DV)
13
168 Page 40 of 52 J. Fei, Z. Deng
13
Rotation invariance and equivariance in 3D deep learning: a… Page 41 of 52 168
Gu R, Wu Q, Li Y et al (2022) Enhanced local and global learning for rotation-invariant point cloud repre-
sentation. IEEE Multimed. https://doi.org/10.1109/MMUL.2022.3151906
Guan J, Qian WW, Peng X, et al (2023) 3d equivariant diffusion for target-aware molecule generation and
affinity prediction. In: International conference on learning representations (ICLR)
Guerrero P, Kleiman Y, Ovsjanikov M et al (2018) Pcpnet learning local shape properties from raw point
clouds. Comput Graph Forum 37(2):75–85. https://doi.org/10.1111/cgf.13343
Guo Y, Sohel F, Bennamoun M et al (2013) Rotational projection statistics for 3d local surface description
and object recognition. Int J Comput Vis 105(1):63–86. https://doi.org/10.1007/s11263-013-0627-y
Haan PD, Weiler M, Cohen T, et al (2021) Gauge equivariant mesh cnns: anisotropic convolutions on geo-
metric graphs. In: International conference on learning representations (ICLR)
Hackel T, Savinov N, Ladicky L, et al (2017) Semantic3d.net: a new large-scale point cloud classification
benchmark. In: ISPRS annals of the photogrammetry, remote sensing and spatial information sci-
ences, pp 91–98
Han J, Rong Y, Xu T, et al (2022) Geometrically equivariant graph neural networks: a survey. arXiv preprint
arXiv:2202.07230
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference
on computer vision and pattern recognition (CVPR), pp 770–778, https://doi.org/10.1109/CVPR.
2016.90
He L, Dong Y, Wang Y, et al (2021) Gauge equivariant transformer. In: Ranzato M, Beygelzimer A, Dau-
phin Y, et al (eds) Advances in neural information processing systems (NeurIPS), vol 34. Curran
Associates, Inc., pp 27,331–27,343
Hegde S, Gangisetty S (2021) Pig-net: inception based deep learning architecture for 3d point cloud seg-
mentation. Comput Graph 95:13–22. https://doi.org/10.1016/j.cag.2021.01.004
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. Artificial neural net-
works and machine learning (ICANN). Springer, Heidelberg, pp 44–51. https://doi.org/10.1007/
978-3-642-21735-7_6
Hoogeboom E, Satorras VG, Vignac C, et al (2022) Equivariant diffusion for molecule generation in 3D.
In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of
machine learning research, vol 162. PMLR, pp 8867–8887
Horie M, Morita N, Hishinuma T, et al (2021) Isometric transformation invariant and equivariant graph con-
volutional networks. In: International conference on learning representations (ICLR)
Horwitz E, Hoshen Y (2023) Back to the feature: classical 3d features are (almost) all you need for 3d
anomaly detection. In: 2023 IEEE/CVF conference on computer vision and pattern recognition work-
shops (CVPRW), pp 2968–2977, https://doi.org/10.1109/CVPRW59228.2023.00298
Huang W, Han J, Rong Y, et al (2022a) Equivariant graph mechanics networks with constraints. In: Interna-
tional conference on learning representations (ICLR)
Huang Y, Peng X, Ma J, et al (2022b) 3DLinker: an e(3) equivariant variational autoencoder for molecular
linker design. In: Proceedings of the 39th international conference on machine learning (ICML), pro-
ceedings of machine learning research, vol 162. PMLR, pp 9280–9294
Hutchinson MJ, Lan CL, Zaidi S, et al (2021) Lietransformer: equivariant self-attention for lie groups.
In: Proceedings of the 38th international conference on machine learning (ICML), proceedings of
machine learning research, vol 139. PMLR, pp 4533–4543
Igashov I, Stärk H, Vignac C, et al (2022) Equivariant 3d-conditional diffusion models for molecular linker
design. arXiv preprint arXiv:2210.05274
Ingraham J, Garg V, Barzilay R, et al (2019) Generative models for graph-based protein design. In:
Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Ionescu C, Papava D, Olaru V et al (2014) Human3.6m: large scale datasets and predictive methods for
3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339.
https://doi.org/10.1109/TPAMI.2013.248
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. In: Cortes C, Lawrence
N, Lee D et al (eds) Advances in neural information processing systems (NIPS), vol 28. Curran Asso-
ciates Inc
Jing B, Eismann S, Soni PN, et al (2021a) Equivariant graph neural networks for 3d macromolecular struc-
ture. arXiv preprint arXiv:2106.03843
Jing B, Eismann S, Suriana P, et al (2021b) Learning from protein structure with geometric vector percep-
trons. In: International conference on learning representations (ICLR)
Jing B, Prabhu V, Gu A et al (2021) Rotation-invariant gait identification with quaternion convolutional neu-
ral networks (student abstract). Proc AAAI Conf Artif Intell (AAAI) 35(18):15805–15806. https://
doi.org/10.1609/aaai.v35i18.17899
13
168 Page 42 of 52 J. Fei, Z. Deng
Joseph-Rivlin M, Zvirin A, Kimmel R (2019) Momen(e)t: f lavor the moments in learning to classify
shapes. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp
4085–4094, https://doi.org/10.1109/ICCVW.2019.00503
Jørgensen PB, Bhowmik A (2022) Equivariant graph neural networks for fast electron density estimation of
molecules, liquids, and solids. NPJ Comput Mater 8:183. https://doi.org/10.1038/s41524-022-00863-y
Kaba SO, Mondal AK, Zhang Y, et al (2023) Equivariance with learned canonicalization functions. In:
Krause A, Brunskill E, Cho K, et al (eds) Proceedings of the 40th international conference on
machine learning, proceedings of machine learning research, vol 202. PMLR, pp 15,546–15,566
Kadam P, Zhang M, Liu S et al (2022) R-pointhop: a green, accurate, and unsupervised point cloud regis-
tration method. IEEE Trans Image Process 31:2710–2725. https://doi.org/10.1109/TIP.2022.3160609
Kadam P, Prajapati H, Zhang M, et al (2023) S3i-pointhop: So(3)-invariant pointhop for 3d point cloud
classification. In: ICASSP 2023 - 2023 IEEE international conference on acoustics, speech and signal
processing (ICASSP), pp 1–5, https://doi.org/10.1109/ICASSP49357.2023.10095473
Kajita S, Ohba N, Jinnouchi R et al (2017) A universal 3d voxel descriptor for solid-state material infor-
matics with deep convolutional neural networks. Sci Rep 7(16):991. https://doi.org/10.1038/
s41598-017-17299-w
Kasaei SH (2021) Orthographicnet: a deep transfer learning approach for 3-d object recognition in open-
ended domains. IEEE/ASME Trans Mechatron 26(6):2910–2921. https://doi.org/10.1109/TMECH.
2020.3048433
Katzir O, Lischinski D, Cohen-Or D (2022) Shape-pose disentanglement using se(3)-equivariant vector
neurons. Computer Vision–ECCV 2022. Springer Nature Switzerland, Cham, pp 468–484
Ke Q, An S, Bennamoun M et al (2017) Skeletonnet: mining deep part features for 3-d action recognition.
IEEE Signal Process Lett 24(6):731–735. https://doi.org/10.1109/LSP.2017.2690339
Kim G, Park YS, Cho Y, et al (2020a) Mulran: Multimodal range dataset for urban place recognition. In:
2020 IEEE international conference on robotics and automation (ICRA), pp 6246–6253, https://doi.
org/10.1109/ICRA40945.2020.9197298
Kim S, Park J, Han B (2020b) Rotation-invariant local-to-global representation learning for 3d point cloud.
In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in neural information processing sys-
tems (NeurIPS), vol 33. Curran Associates, Inc., pp 8174–8185
Kim S, Jeong Y, Park C, et al (2022) SeLCA: Self-supervised learning of canonical axis. In: NeurIPS 2022
workshop on symmetry and geometry in neural representations
Köhler J, Klein L, Noe F (2020) Equivariant flows: exact likelihood generative learning for symmetric den-
sities. In: Proceedings of the 37th international conference on machine learning (ICML), proceedings
of machine learning research, vol 119. PMLR, pp 5361–5370
Kondor R (2018) N-body networks: a covariant hierarchical neural network architecture for learning atomic
potentials. arXiv preprint arXiv:1803.01588
Kondor R, Trivedi S (2018) On the generalization of equivariance and convolution in neural networks to the
action of compact groups. In: Proceedings of the 35th international conference on machine learning
(ICML), proceedings of machine learning research, vol 80. PMLR, pp 2747–2755
Kondor R, Lin Z, Trivedi S (2018) Clebsch-gordan nets: a fully fourier space spherical convolutional neural
network. In: Bengio S, Wallach H, Larochelle H et al (eds) Advances in neural information processing
systems (NeurIPS), vol 31. Curran Associates Inc, NewYork
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural net-
works. In: Pereira F, Burges C, Bottou L et al (eds) Advances in neural information processing sys-
tems (NIPS), vol 25. Curran Associates Inc, NewYork
Lähner Z, Rodola E, Bronstein MM, et al (2016) Shrec’16: matching of deformable shapes with topological
noise. Proc 3DOR 2(10.2312)
Lai K, Bo L, Ren X, et al (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE
international conference on robotics and automation (ICRA), pp 1817–1824, https://doi.org/10.1109/
ICRA.2011.5980382
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs.
In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4558–4567, https://
doi.org/10.1109/CVPR.2018.00479
Le H (2021) Geometric invariance of pointnet. Bachelor’s thesis, Tampere University, Tampere, Finland
Le T, Noé F, Clevert DA (2022a) Equivariant graph attention networks for molecular property prediction.
arXiv preprint arXiv:2202.09891
Le T, Noe F, Clevert DA (2022b) Representation learning on biomolecular structures using equivariant
graph attention. In: Rieck B, Pascanu R (eds) Proceedings of the first learning on graphs conference,
proceedings of machine learning research, vol 198. PMLR, pp 30:1–30:17
13
Rotation invariance and equivariance in 3D deep learning: a… Page 43 of 52 168
Lei J, Deng C, Schmeckpeper K, et al (2023) Efem: equivariant neural field expectation maximization for 3d
object segmentation without scene supervision. In: 2023 IEEE/CVF conference on computer vision
and pattern recognition (CVPR), pp 4902–4912, https://doi.org/10.1109/CVPR52729.2023.00475
Li C, Wei W, Li J et al (2021) 3dmol-net: learn 3d molecular representation using adaptive graph convolu-
tional network based on rotation invariance. IEEE J Biomed Health Inform. https://doi.org/10.1109/
JBHI.2021.3089162
Li F, Fujiwara K, Okura F, et al (2021b) A closer look at rotation-invariant deep point cloud analysis. In:
2021 IEEE/CVF international conference on computer vision (ICCV), pp 16,198–16,207, https://doi.
org/10.1109/ICCV48922.2021.01591
Li J, Bi Y, Lee GH (2019a) Discrete rotation equivariance for point cloud recognition. In: 2019 International
conference on robotics and automation (ICRA), pp 7269–7275, https://doi.org/10.1109/ICRA.2019.
8793983
Li J, Luo S, Deng C, et al (2022a) Directed weight neural networks for protein structure representation
learning. https://doi.org/10.48550/ARXIV.2201.13299
Li L, Zhu S, Fu H, et al (2020) End-to-end learning local multi-view descriptors for 3d point clouds. In:
2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1916–1925,
https://doi.org/10.1109/CVPR42600.2020.00199
Li L, Kong X, Zhao X et al (2022) Rinet: efficient 3d lidar-based place recognition using rotation invariant
neural network. IEEE Robot Autom Lett 7(2):4321–4328. https://doi.org/10.1109/LRA.2022.31504
99
Li RW, Zhang LX, Li C, et al (2023a) E3sym: leveraging e(3) invariance for unsupervised 3d planar reflec-
tive symmetry detection. In: Proceedings of the IEEE/CVF international conference on computer
vision (ICCV), pp 14,543–14,553
Li X, Li R, Chen G et al (2021) A rotation-invariant framework for deep point cloud analysis. IEEE Trans
Visual Comput Graph. https://doi.org/10.1109/TVCG.2021.3092570
Li X, Weng Y, Yi L et al (2021) Leveraging se(3) equivariance for self-supervised category-level object
pose estimation from point clouds. In: Ranzato M, Beygelzimer A, Dauphin Y et al (eds) Advances
in neural information processing systems (NeurIPS), vol 34. Curran Associates Inc., NewYork, pp
15370–15381
Li X, Wu W, Fern XZ, et al (2023b) Improving the robustness of point convolution on k-nearest neighbor
neighborhoods with a viewpoint-invariant coordinate transform. In: 2023 IEEE/CVF winter confer-
ence on applications of computer vision (WACV), pp 1287–1297, https://doi.org/10.1109/WACV5
6688.2023.00134
Li Y, Gu C, Dullien T, et al (2019b) Graph matching networks for learning the similarity of graph structured
objects. In: Proceedings of the 36th international conference on machine learning (ICML), proceed-
ings of machine learning research, vol 97. PMLR, pp 3835–3845
Li Z, Yang Y, Faraggi E et al (2014) Direct prediction of profiles of sequences compatible with a protein
structure by neural networks with fragment-based local and energy-based nonlocal profiles. Proteins
Struct Funct Bioinf 82(10):2565–2573. https://doi.org/10.1002/prot.24620
Liao Y, Xie J, Geiger A (2022) Kitti-360: a novel dataset and benchmarks for urban scene understanding in
2d and 3d. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3179507
Lin CE, Song J, Zhang R, et al (2022a) SE(3)-equivariant point cloud-based place recognition. In: 6th
Annual conference on robot learning
Lin CE, Song J, Zhang R, et al (2023a) Se(3)-equivariant point cloud-based place recognition. In: Liu K,
Kulic D, Ichnowski J (eds) Proceedings of The 6th conference on robot learning, proceedings of
machine learning research, vol 205. PMLR, pp 1520–1530
Lin CW, Chen TI, Lee HY, et al (2023b) Coarse-to-fine point cloud registration with se(3)-equivariant
representations. In: 2023 IEEE international conference on robotics and automation (ICRA), pp
2833–2840, https://doi.org/10.1109/ICRA48891.2023.10161141
Lin H, Huang Y, Liu M, et al (2022b) Diffbp: generative diffusion of 3d molecules for target protein
binding. arXiv preprint arXiv:2211.11214
Lin J, Li H, Chen K et al (2021) Sparse steerable convolutions: an efficient learning of se(3)-equivariant
features for estimation and tracking of object poses in 3d space. Advances in neural information
processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 16779–16790
Lin J, Rickert M, Knoll A (2021b) Deep hierarchical rotation invariance learning with exact geometry
feature representation for point cloud classification. In: 2021 IEEE international conference on
robotics and automation (ICRA), pp 9529–9535, https://doi.org/10.1109/ICRA48506.2021.95613
07
Liu D, Chen C, Xu C et al (2022) A robust and reliable point cloud recognition network under rigid
transformation. IEEE Trans Instrum Meas 71:1–13. https://doi.org/10.1109/TIM.2022.3142077
13
168 Page 44 of 52 J. Fei, Z. Deng
Liu M, Yao F, Choi C, et al (2019a) Deep learning 3d shapes using alt-az anisotropic 2-sphere convolu-
tion. In: International conference on learning representations (ICLR)
Liu S, Guo H, Tang J (2022b) Molecular geometry pretraining with se(3)-invariant denoising distance
matching. https://doi.org/10.48550/ARXIV.2206.13602
Liu Y, Wang C, Song Z et al (2018) Efficient global point cloud registration by matching rotation invari-
ant features through translation search. Computer Vision–ECCV 2018. Springer International
Publishing, Cham, pp 460–474. https://doi.org/10.1007/978-3-030-01258-8_28
Liu Y, Fan B, Xiang S, et al (2019b) Relation-shape convolutional neural network for point cloud analy-
sis. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8887–
8896, https://doi.org/10.1109/CVPR.2019.00910
Liu Y, Hong W, Cao B (2022) Molnet-3d: deep learning of molecular representations and properties
from 3d topography. Adv Theory Simul 5(6):2200037. https://doi.org/10.1002/adts.202200037
Liu Z, Zhou S, Suo C, et al (2019c) Lpd-net: 3d point cloud learning for large-scale place recogni-
tion and environment analysis. In: 2019 IEEE/CVF international conference on computer vision
(ICCV), pp 2831–2840, https://doi.org/10.1109/ICCV.2019.00292
Lohit S, Trivedi S (2020) Rotation-invariant autoencoders for signals on spheres. https://doi.org/10.
48550/ARXIV.2012.04474
Lou Y, Ye Z, You Y et al (2023) Crin: rotation-invariant point cloud analysis and rotation estimation via
centrifugal reference frame. Proc AAAI Conf Artif Intell (AAAI) 37(2):1817–1825. https://doi.
org/10.1609/aaai.v37i2.25271
Luo S, Li J, Guan J, et al (2022) Equivariant point cloud analysis via learning orientations for message
passing. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp
18,910–18,919, https://doi.org/10.1109/CVPR52688.2022.01836
Maddern W, Pascoe G, Linegar C et al (2017) 1 year, 1000 km: the oxford robotcar dataset. Int J Robot
Res 36(1):3–15. https://doi.org/10.1177/0278364916679498
Marcon M, Spezialetti R, Salti S et al (2021) Unsupervised learning of local equivariant descriptors for
point clouds. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3126713
Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recogni-
tion. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp
922–928, https://doi.org/10.1109/IROS.2015.7353481
McNitt-Gray MF, Armato SG, Meyer CR et al (2007) The lung image database consortium (lidc) data
collection process for nodule detection and annotation. Acad Radiol 14(12):1464–1474. https://
doi.org/10.1016/j.acra.2007.07.021
Mehta D, Rhodin H, Casas D, et al (2017) Monocular 3d human pose estimation in the wild using
improved cnn supervision. In: 2017 International conference on 3D Vision (3DV), pp 506–516,
https://doi.org/10.1109/3DV.2017.00064
Mei G, Tang H, Huang X, et al (2023) Unsupervised deep probabilistic approach for partial point cloud
registration. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR),
pp 13,611–13,620, https://doi.org/10.1109/CVPR52729.2023.01308
Melnyk P, Felsberg M, Wadenbäck M (2022) Steerable 3D spherical neurons. In: Proceedings of the
39th international conference on machine learning (ICML), proceedings of machine learning
research, vol 162. PMLR, pp 15,330–15,339
Melzi S, Spezialetti R, Tombari F, et al (2019) Gframes: gradient-based local reference frame for 3d
shape matching. In: 2019 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 4624–4633, https://doi.org/10.1109/CVPR.2019.00476
Meng HY, Gao L, Lai YK, et al (2019) Vv-net: Voxel vae net with group convolutions for point cloud seg-
mentation. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 8499–8507,
https://doi.org/10.1109/ICCV.2019.00859
Menze BH, Jakab A, Bauer S et al (2015) The multimodal brain tumor image segmentation benchmark
(brats). IEEE Trans Med Imaging 34(10):1993–2024. https://doi.org/10.1109/TMI.2014.2377694
Mo K, Zhu S, Chang AX, et al (2019) Partnet: a large-scale benchmark for fine-grained and hierarchical
part-level 3d object understanding. In: 2019 IEEE/CVF conference on computer vision and pattern
recognition (CVPR), pp 909–918, https://doi.org/10.1109/CVPR.2019.00100
Moon J, Kim H, Lee B (2018) View-point invariant 3d classification for mobile robots using a con-
volutional neural network. Int J Control Autom Syst 16(6):2888–2895. https://doi.org/10.1007/
s12555-018-0182-y
Mukhaimar A, Tennakoon R, Lai CY et al (2022) Robust object classification approach using spherical har-
monics. IEEE Access 10:21541–21553. https://doi.org/10.1109/ACCESS.2022.3151350
13
Rotation invariance and equivariance in 3D deep learning: a… Page 45 of 52 168
Novotny D, Ravi N, Graham B, et al (2019) C3dpo: canonical 3d pose networks for non-rigid structure from
motion. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 7687–7696,
https://doi.org/10.1109/ICCV.2019.00778
Pan G, Liu P, Wang J et al (2019) 3dti-net: learn 3d transform-invariant feature using hierarchical graph cnn.
PRICAI 2019: trends in artificial intelligence. Springer International Publishing, Cham, pp 37–51.
https://doi.org/10.1007/978-3-030-29911-8_4
Pan L, Cai Z, Liu Z (2021) Robust partial-to-partial point cloud registration in a full range. https://doi.org/
10.48550/ARXIV.2111.15606
Park JY, Biza O, Zhao L, et al (2022) Learning symmetric embeddings for equivariant world models.
In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of
machine learning research, vol 162. PMLR, pp 17,372–17,389
Paulhac L, Makris P, Ramel JY, et al (2009) A solid texture database for segmentation and classification
experiments. In: VISAPP (2), pp 135–141
Poiesi F, Boscaini D (2021) Distinctive 3d local deep descriptors. In: 2020 25th international conference on
pattern recognition (ICPR), pp 5720–5727, https://doi.org/10.1109/ICPR48806.2021.9411978
Poiesi F, Boscaini D (2023) Learning general and distinctive 3d local deep descriptors for point cloud regis-
tration. IEEE Trans Pattern Anal Mach Intell 45(3):3979–3985. https://doi.org/10.1109/TPAMI.2022.
3175371
Pomerleau F, Liu M, Colas F et al (2012) Challenging data sets for point cloud registration algorithms. Int J
Robot Res 31(14):1705–1711. https://doi.org/10.1177/0278364912458814
Pop A, Domşa V, Tamas L (2023) Rotation invariant graph neural network for 3d point clouds. Remote
Sens. https://doi.org/10.3390/rs15051437
Poulenard A, Guibas LJ (2021) A functional approach to rotation equivariant non-linearities for tensor field
networks. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp
13,169–13,178, https://doi.org/10.1109/CVPR46437.2021.01297
Poulenard A, Rakotosaona MJ, Ponty Y, et al (2019) Effective rotation-invariant point cnn with spherical
harmonics kernels. In: 2019 International conference on 3D vision (3DV), pp 47–56, https://doi.org/
10.1109/3DV.2019.00015
Pujol-Miró A, Casas JR, Ruiz-Hidalgo J (2019) Correspondence matching in unorganized 3d point clouds
using convolutional neural networks. Image Vis Comput 83–84:51–60. https://doi.org/10.1016/j.ima-
vis.2019.02.013
Puny O, Atzmon M, Smith EJ, et al (2022) Frame averaging for invariant and equivariant network design.
In: International conference on learning representations (ICLR)
Qi CR, Su H, Nießner M, et al (2016) Volumetric and multi-view cnns for object classification on 3d data.
In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 5648–5656,
https://doi.org/10.1109/CVPR.2016.609
Qi CR, Su H, Kaichun M, et al (2017a) Pointnet: deep learning on point sets for 3d classification and seg-
mentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 77–85,
https://doi.org/10.1109/CVPR.2017.16
Qi CR, Yi L, Su H et al (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric
space. Advances in neural information processing systems (NIPS), vol 30. Curran Associates Inc,
New York
Qin S, Zhang X, Xu H et al (2022) Fast quaternion product units for learning disentangled representations in
𝕊𝕆(3). IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3202217
Qin S, Li Z, Liu L (2023a) Robust 3d shape classification via non-local graph attention network. In: 2023
IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5374–5383, https://
doi.org/10.1109/CVPR52729.2023.00520
Qin Z, Yu H, Wang C et al (2023) Geotransformer: fast and robust point cloud registration with geomet-
ric transformer. IEEE Trans Pattern Anal Mach Intell 45(8):9806–9821. https://doi.org/10.1109/
TPAMI.2023.3259038
Qiu Z, Li Y, Wang Y et al (2022) Spe-net: boosting point cloud analysis via rotation robustness enhance-
ment. Computer Vision–ECCV 2022. Springer Nature Switzerland, Cham, pp 593–609
Ramakrishnan R, Dral PO, Rupp M et al (2014) Quantum chemistry structures and properties of 134
kilo molecules. Sci Data 1(1):140,022. https://doi.org/10.1038/sdata.2014.22
Rao Y, Lu J, Zhou J (2019) Spherical fractal convolutional neural networks for point cloud recognition.
In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 452–460,
https://doi.org/10.1109/CVPR.2019.00054
Rasp S, Dueben PD, Scher S et al (2020) Weatherbench: a benchmark data set for data-driven weather
forecasting. J Adv Model Earth Syst 12(11):e2020MS002. https://doi.org/10.1029/2020MS002203
13
168 Page 46 of 52 J. Fei, Z. Deng
Roveri R, Rahmann L, Öztireli AC, et al (2018) A network architecture for point cloud classification via
automatic depth images generation. In: 2018 IEEE/CVF conference on computer vision and pat-
tern recognition, pp 4176–4184, https://doi.org/10.1109/CVPR.2018.00439
Rupp M, Tkatchenko A, Müller KR et al (2012) Fast and accurate modeling of molecular atomiza-
tion energies with machine learning. Phys Rev Lett 108(058):301. https://doi.org/10.1103/PhysR
evLett.108.058301
Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: 2009
IEEE international conference on robotics and automation (ICRA), pp 3212–3217, https://doi.org/
10.1109/ROBOT.2009.5152473
Sahin YH, Mertan A, Unal G (2022) Odfnet: using orientation distribution functions to characterize 3d
point clouds. Comput Graph 102:610–618. https://doi.org/10.1016/j.cag.2021.08.016
Sajnani R, Poulenard A, Jain J, et al (2022) Condor: self-supervised canonicalization of 3d pose for par-
tial shapes. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR),
pp 16,948–16,958, https://doi.org/10.1109/CVPR52688.2022.01646
Salihu D, Steinbach E (2023) Sgpcr: spherical gaussian point cloud representation and its application to
object registration and retrieval. In: 2023 IEEE/CVF winter conference on applications of com-
puter vision (WACV), pp 572–581, https://doi.org/10.1109/WACV56688.2023.00064
Satorras VG, Hoogeboom E, Fuchs F et al (2021) E(n) equivariant normalizing flows. Advances in
neural information processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp
4181–4192
Satorras VG, Hoogeboom E, Welling M (2021b) E(n) equivariant graph neural networks. In: Proceed-
ings of the 38th international conference on machine learning (ICML), proceedings of machine
learning research, vol 139. PMLR, pp 9323–9332
Savva M, Yu F, Su H, et al (2017) Large-scale 3d shape retrieval from shapenet core55: Shrec’17 track.
In: Proceedings of the workshop on 3D object retrieval. Eurographics Association, Goslar, DEU,
3Dor ’17, pp 39–50, https://doi.org/10.2312/3dor.20171050
Schneuing A, Du Y, Harris C, et al (2022) Structure-based drug design with equivariant diffusion mod-
els. https://doi.org/10.48550/ARXIV.2210.13695
Schütt K, Kindermans PJ, Sauceda Felix HE et al (2017) Schnet: a continuous-filter convolutional neu-
ral network for modeling quantum interactions. In: Guyon I, Luxburg UV, Bengio S et al (eds)
Advances in neural information processing systems (NIPS), vol 30. Curran Associates Inc
Schütt K, Unke O, Gastegger M (2021) Equivariant message passing for the prediction of tensorial prop-
erties and molecular spectra. In: Proceedings of the 38th international conference on machine
learning (ICML), proceedings of machine learning research, vol 139. PMLR, pp 9377–9388
Schütt KT, Sauceda HE, Kindermans PJ et al (2018) Schnet–A deep learning architecture for molecules
and materials. J Chem Phys 148(24):241722. https://doi.org/10.1063/1.5019779
Shahroudy A, Liu J, Ng TT, et al (2016) Ntu rgb+d: a large scale dataset for 3d human activity analysis.
In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1010–1019,
https://doi.org/10.1109/CVPR.2016.115
Shakerinava M, Ravanbakhsh S (2021) Equivariant networks for pixelized spheres. In: Proceedings of
the 38th international conference on machine learning (ICML), proceedings of machine learning
research, vol 139. PMLR, pp 9477–9488
Shan Z, Yang Q, Ye R, et al (2023) Gpa-net:no-reference point cloud quality assessment with multi-
task graph convolutional network. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.
2023.3282802
Shen W, Zhang B, Huang S et al (2020) 3d-rotation-equivariant quaternion neural networks. Computer
Vision–ECCV 2020. Springer International Publishing, Cham, pp 531–547. https://doi.org/10.
1007/978-3-030-58565-5_32
Shen Z, Hong T, She Q, et al (2022) PDO-s3DCNNs: artial differential operator based steerable 3D
CNNs. In: Proceedings of the 39th international conference on machine learning (ICML), pro-
ceedings of machine learning research, Vol 162. PMLR, pp 19827–19846
Shi B, Bai S, Zhou Z et al (2015) Deeppano: deep panoramic representation for 3-d shape recognition.
IEEE Signal Process Lett 22(12):85. https://doi.org/10.1109/LSP.2015.2480802
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud.
In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–779,
https://doi.org/10.1109/CVPR.2019.00086
Shotton J, Glocker B, Zach C, et al (2013) Scene coordinate regression forests for camera relocaliza-
tion in rgb-d images. In: 2013 IEEE conference on computer vision and pattern recognition, pp
2930–2937, https://doi.org/10.1109/CVPR.2013.377
13
Rotation invariance and equivariance in 3D deep learning: a… Page 47 of 52 168
Siddani B, Balachandar S, Fang R (2021) Rotational and reflectional equivariant convolutional neural
network for data-limited applications: multiphase flow demonstration. Phys Fluids 33(10):103323.
https://doi.org/10.1063/5.0066049
Simeonov A, Du Y, Tagliasacchi A, et al (2022) Neural descriptor fields: Se(3)-equivariant object repre-
sentations for manipulation. In: 2022 international conference on robotics and automation (ICRA),
pp 6394–6400, https://doi.org/10.1109/ICRA46639.2022.9812146
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition.
https://doi.org/10.48550/ARXIV.1409.1556
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In:
2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576, https://
doi.org/10.1109/CVPR.2015.7298655
Spezialetti R, Salti S, Stefano LD (2019) Learning an effective equivariant 3d descriptor without super-
vision. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6400–6409,
https://doi.org/10.1109/ICCV.2019.00650
Spezialetti R, Stella F, Marcon M et al (2020) Learning to orient surfaces by self-supervised spherical
cnns. In: Larochelle H, Ranzato M, Hadsell R et al (eds) Advances in neural information process-
ing systems (NeurIPS), vol 33. Curran Associates Inc, NewYork, pp 5381–5392
Stärk H, Ganea OE, Pattanaik L, et al (2022) Equibind: geometric deep learning for drug binding struc-
ture prediction. In: ICLR 2022 workshop on geometrical and topological representation learning
Su H, Maji S, Kalogerakis E, et al (2015) Multi-view convolutional neural networks for 3d shape recog-
nition. In: 2015 IEEE international conference on computer vision (ICCV), pp 945–953, https://
doi.org/10.1109/ICCV.2015.114
Subramanian G, Ramsundar B, Pande V et al (2016) Computational modeling of ß-secretase 1 (bace-1)
inhibitors using ligand based approaches. J Chem Inf Model 56(10):1936–1949. https://doi.org/10.
1021/acs.jcim.6b00290
Suk J, de Haan P, Lippe P, et al (2021) Equivariant graph neural networks as surrogate for computational
fluid dynamics in 3d artery models. In: Fourth workshop on machine learning and the physical sci-
ences (NeurIPS 2021)
Suk J, Haan Pd, Lippe P et al (2022) Mesh convolutional neural networks for wall shear stress estima-
tion in 3d artery models. Statistical atlases and computational models of the heart. Multi-disease,
multi-view, and multi-center right ventricular segmentation in cardiac MRI challenge. Springer,
Cham, pp 93–102. https://doi.org/10.1007/978-3-030-93722-5_11
Sun T, Liu M, Ye H et al (2019) Point-cloud-based place recognition using CNN feature extraction.
IEEE Sens J 19(24):12175–12186. https://doi.org/10.1109/JSEN.2019.2937740
Sun W, Tagliasacchi A, Deng B et al (2021) Canonical capsules: self-supervised capsules in canonical
pose. In: Ranzato M, Beygelzimer A, Dauphin Y et al (eds) Advances in neural information pro-
cessing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 24993–25005
Sun X, Wei Y, Liang S, et al (2015) Cascaded hand pose regression. In: 2015 IEEE conference on com-
puter vision and pattern recognition (CVPR), pp 824–832, https://doi.org/10.1109/CVPR.2015.
7298683
Sun X, Lian Z, Xiao J (2019b) Srinet: Learning strictly rotation-invariant representations for point cloud
classification and segmentation. In: Proceedings of the 27th ACM international conference on
multimedia (ACM MM). Association for computing machinery, New York, MM ’19, pp 980–988,
https://doi.org/10.1145/3343031.3351042
Sun X, Huang Y, Lian Z (2023) Learning isometry-invariant representations for point cloud analysis.
Pattern Recogn 134(109):087. https://doi.org/10.1016/j.patcog.2022.109087
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on
computer vision and pattern recognition (CVPR), pp 1–9, https://doi.org/10.1109/CVPR.2015.
7298594
Tabib RA, Upasi N, Anvekar T, et al (2023) Ipd-net: so(3) invariant primitive decompositional network for
3d point clouds. In: 2023 IEEE/CVF conference on computer vision and pattern recognition work-
shops (CVPRW), pp 2736–2744, https://doi.org/10.1109/CVPRW59228.2023.00274
Tang D, Chang HJ, Tejani A, et al (2014) Latent regression forest: structured estimation of 3d articulated
hand posture. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp
3786–3793, https://doi.org/10.1109/CVPR.2014.490
Tao Z, Zhu Y, Wei T et al (2021) Multi-head attentional point cloud classification and segmentation using
strictly rotation-invariant representations. IEEE Access 9:71,133-71,144. https://doi.org/10.1109/
ACCESS.2021.3079295
Team NLSTR (2011) Reduced lung-cancer mortality with low-dose computed tomographic screening. N
Engl J Med 365(5):395–409
13
168 Page 48 of 52 J. Fei, Z. Deng
Thölke P, Fabritiis GD (2022) Equivariant transformers for neural network based molecular potentials. In:
International conference on learning representations (ICLR)
Thomas NC (2019) Euclidean-equivariant functions on three-dimensional point clouds. PhD thesis, Stan-
ford University
Thomas NC, Smidt T, Kearnes S, et al (2018) Tensor field networks: rotation- and translation-equivariant
neural networks for 3d point clouds. https://doi.org/10.48550/ARXIV.1802.08219
Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. Com-
puter Vision–ECCV 2010. Springer, Berlin, pp 356–369. https://doi.org/10.1007/978-3-642-15558-
1_26
Tompson J, Stein M, Lecun Y, et al (2014) Real-time continuous pose recovery of human hands using con-
volutional networks. ACM Trans Graph 33(5). https://doi.org/10.1145/2629500
Townshend R, Bedi R, Suriana P et al (2019) End-to-end learning on 3d protein structure for interface pre-
diction. Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates Inc,
NewYork
Townshend RJL, Vögele M, Suriana PA, et al (2021) Atom3d: tasks on molecules in three dimensions. In:
Thirty-fifth conference on neural information processing systems datasets and benchmarks track
Uy MA, Pham QH, Hua BS, et al (2019) Revisiting point cloud classification: a new benchmark dataset and
classification model on real-world data. In: 2019 IEEE/CVF international conference on computer
vision (ICCV), pp 1588–1597, https://doi.org/10.1109/ICCV.2019.00167
Villar S, Hogg DW, Storey-Fisher K et al (2021) Scalars are universal: equivariant machine learning, struc-
tured like classical physics. Advances in neural information processing systems (NeurIPS), vol 34.
Curran Associates Inc, NewYork, pp 28848–28863
Vreven T, Moal IH, Vangone A et al (2015) Updates to the integrated protein-protein interaction bench-
marks: docking benchmark version 5 and affinity benchmark version 2. J Mol Biol 427(19):3031–
3041. https://doi.org/10.1016/j.jmb.2015.07.016
Wang C, Pelillo M, Siddiqi K (2017) Dominant set clustering and pooling for multi-view 3d object rec-
ognition. In: Tae-Kyun Kim GBStefanos Zafeiriou, Mikolajczyk K (eds) Proceedings of the British
Machine Vision Conference (BMVC). BMVA Press, pp 64.1–64.12, https://doi.org/10.5244/C.31.64
Wang H, Sridhar S, Huang J, et al (2019a) Normalized object coordinate space for category-level 6d object
pose and size estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 2637–2646, https://doi.org/10.1109/CVPR.2019.00275
Wang H, Liu Y, Dong Z, et al (2022a) You only hypothesize once: point cloud registration with rotation-
equivariant descriptors. In: Proceedings of the 30th ACM international conference on multimedia
(ACM MM). Association for Computing Machinery, New York, NY, USA, MM ’22, pp 1630–1641,
https://doi.org/10.1145/3503161.3548023
Wang H, Liu Y, Hu Q et al (2023) Roreg: pairwise point cloud registration with oriented descriptors and
local rotations. IEEE Trans Pattern Anal Mach Intell 45(8):10376–10393. https://doi.org/10.1109/
TPAMI.2023.3244951
Wang J, Chakraborty R, Yu SX (2021) Spatial transformer for 3d point clouds. IEEE Trans Pattern Anal
Mach Intell. https://doi.org/10.1109/TPAMI.2021.3070341
Wang L, Liu Y, Lin Y, et al (2022b) ComENet: towards complete and efficient message passing for 3d
molecular graphs. In: Advances in neural information processing systems (NeurIPS)
Wang X, Lei J, Lan H, et al (2023b) Dueqnet: dual-equivariance network in outdoor 3d object detection for
autonomous driving. In: 2023 IEEE International conference on robotics and automation (ICRA), pp
6951–6957, https://doi.org/10.1109/ICRA48891.2023.10161353
Wang Y, Sun Y, Liu Z et al (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph.
https://doi.org/10.1145/3326362
Wang Y, Zhao Y, Ying S et al (2022) Rotation-invariant point cloud representation for 3-d model recogni-
tion. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2022.3157593
Wang Y, Wang J, Qu Y, et al (2023c) Rip-nerf: learning rotation-invariant point-based neural radiance field
for fine-grained editing and compositing. In: Proceedings of the 2023 ACM international conference
on multimedia retrieval. Association for computing machinery, New York, NY, USA, ICMR ’23, p
125-134, https://doi.org/10.1145/3591106.3592276
Wang Z, Rosen D (2023) Manufacturing process classification based on distance rotationally invariant con-
volutions. J Comput Inf Sci Eng 23(5):051,004. https://doi.org/10.1115/1.4056806
Wei X, Yu R, Sun J (2020) View-gcn: view-based graph convolutional network for 3d shape analysis. In:
2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1847–1856,
https://doi.org/10.1109/CVPR42600.2020.00192
Wei X, Yu R, Sun J (2022) Learning view-based graph convolutional network for multi-view 3d shape anal-
ysis. IEEE Trans Pattern Anal Mach Intell 25:1–17. https://doi.org/10.1109/TPAMI.2022.3221785
13
Rotation invariance and equivariance in 3D deep learning: a… Page 49 of 52 168
Weihsbach C, Hansen L, Heinrich M (2022) Xedgeconv: leveraging graph convolutions for efficient, permu-
tation- and rotation-invariant dense 3d medical image segmentation. In: Proceedings of the first inter-
national workshop on geometric deep learning in medical image analysis, Proceedings of machine
learning research, vol 194. PMLR, pp 61–71
Weiler M, Geiger M, Welling M et al (2018) 3d steerable CNNS: learning rotationally equivariant features
in volumetric data. Advances in neural information processing systems (NeurIPS), vol 31. Curran
Associates Inc, NewYork
Winkels M, Cohen TS (2018) 3d g-cnns for pulmonary nodule detection. In: Medical imaging with deep
learning (MIDL)
Winkels M, Cohen TS (2019) Pulmonary nodule detection in CT scans with equivariant CNNS. Med Image
Anal 55:15–26. https://doi.org/10.1016/j.media.2019.03.010
Winter R, Bertolini M, Le T, et al (2022) Unsupervised learning of group invariant and equivariant repre-
sentations. In: Advances in neural information processing systems (NeurIPS)
Worrall D, Brostow G (2018) Cubenet: equivariance to 3d rotation and translation. Computer Vision–ECCV
2018. Springer International Publishing, Cham, pp 585–602. https://doi.org/10.1007/978-3-030-
01228-1_35
Wu H, Miao Y (2022) So(3) rotation equivariant point cloud completion using attention-based vector neu-
rons. In: 2022 International Conference on 3D Vision (3DV), pp 280–290, https://doi.org/10.1109/
3DV57658.2022.00040
Wu W, Qi Z, Fuxin L (2019) Pointconv: deep convolutional networks on 3d point clouds. In: 2019 IEEE/
CVF conference on computer vision and pattern recognition (CVPR), pp 9613–9622, https://doi.org/
10.1109/CVPR.2019.00985
Wu Z, Song S, Khosla A, et al (2015) 3d shapenets: a deep representation for volumetric shapes. In: 2015
IEEE conference on computer vision and pattern recognition (CVPR), pp 1912–1920, https://doi.org/
10.1109/CVPR.2015.7298801
Xiang Y, Kim W, Chen W et al (2016) Objectnet3d: a large scale database for 3d object recognition. Com-
puter Vision–ECCV 2016. Springer International Publishing, Cham, pp 160–176. https://doi.org/10.
1007/978-3-319-46484-8_10
Xiao C, Wachs J (2021) Triangle-net: towards robustness in point cloud learning. In: 2021 IEEE winter con-
ference on applications of computer vision (WACV), pp 826–835, https://doi.org/10.1109/WACV4
8630.2021.00087
Xiao J, Owens A, Torralba A (2013) Sun3d: a database of big spaces reconstructed using sfm and object
labels. In: 2013 IEEE international conference on computer vision (ICCV), pp 1625–1632, https://
doi.org/10.1109/ICCV.2013.458
Xiao Z, Lin H, Li R, et al (2020) Endowing deep 3d models with rotation invariance based on principal
component analysis. In: 2020 IEEE international conference on multimedia and expo (ICME), pp
1–6, https://doi.org/10.1109/ICME46284.2020.9102947
Xie L, Yang Y, Wang W, et al (2023) General rotation invariance learning for point clouds via weight-
feature alignment. https://doi.org/10.48550/arXiv.2302.09907
Xu C, Chen S, Li M et al (2021) Invariant teacher and equivariant student for unsupervised 3d human pose
estimation. Proc AAAI Conf Artif Intell (AAAI) 35(4):3013–3021. https://doi.org/10.1609/aaai.
v35i4.16409
Xu J, Tang X, Zhu Y, et al (2021b) Sgmnet: Learning rotation-invariant point cloud representations via
sorted gram matrix. In: 2021 IEEE/CVF International conference on computer vision (ICCV), pp
10,448–10,457, https://doi.org/10.1109/ICCV48922.2021.01030
Xu J, Yang Q, Li C, et al (2022) Rotation-equivariant graph convolutional networks for spherical data via
global-local attention. In: 2022 IEEE International conference on image processing (ICIP), pp 2501–
2505, https://doi.org/10.1109/ICIP46576.2022.9897510
Xu M, Zhou Z, Qiao Y (2020) Geometry sharing network for 3d point cloud classification and segmen-
tation. Proc AAAI Conf Artif Intell (AAAI) 34(07):12500–12507. https://doi.org/10.1609/aaai.
v34i07.6938
Xu X, Yin H, Chen Z et al (2021) Disco: differentiable scan context with orientation. IEEE Robot cs
Autom Lett 6(2):2791–2798. https://doi.org/10.1109/LRA.2021.3060741
Xu X, Lu S, Wu J et al (2023) Ring++: Roto-translation invariant gram for global localization on a
sparse scan map. IEEE Trans Rob 39(6):4616–4635. https://doi.org/10.1109/TRO.2023.3303035
Xu Z, Liu K, Chen K et al (2023) Classification of single-view object point clouds. Pattern Recogn
135(109):137. https://doi.org/10.1016/j.patcog.2022.109137
Yang F, Wang H, Jin Z (2021) Adaptive gmm convolution for point cloud learning. In: Proceedings of
the British machine vision conference (BMVC), BMVA Press
13
168 Page 50 of 52 J. Fei, Z. Deng
13
Rotation invariance and equivariance in 3D deep learning: a… Page 51 of 52 168
13
168 Page 52 of 52 J. Fei, Z. Deng
Zhu M, Han S, Cai H, et al (2023) 4d panoptic segmentation as invariant and equivariant field predic-
tion. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp
22,488–22,498
Zhuang X, Li Y, Hu Y et al (2019) Self-supervised feature learning for 3d medical images by playing
Rubik’s cube. In: Shen D, Liu T, Peters TM et al (eds) Medical image computing and computer
assisted intervention (MICCAI). Springer International Publishing, Cham, pp 420–428. https://doi.
org/10.1007/978-3-030-32251-9_46
Zitnick CL, Chanussot L, Das A, et al (2020) An introduction to electrocatalyst design using machine learn-
ing for renewable energy storage. https://doi.org/10.48550/ARXIV.2010.09435
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
13