0% found this document useful (0 votes)
5 views52 pages

Fei 2024

This survey comprehensively reviews rotation invariance and equivariance methods in 3D deep learning, categorizing them into weak and strong definitions and providing a unified theoretical framework. It highlights the challenges faced by deep neural networks (DNNs) in handling non-aligned 3D data and discusses various methods developed to improve their robustness against rotation. The paper also outlines applications, datasets, and future research directions in this area.

Uploaded by

moraes.lsantos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views52 pages

Fei 2024

This survey comprehensively reviews rotation invariance and equivariance methods in 3D deep learning, categorizing them into weak and strong definitions and providing a unified theoretical framework. It highlights the challenges faced by deep neural networks (DNNs) in handling non-aligned 3D data and discusses various methods developed to improve their robustness against rotation. The paper also outlines applications, datasets, and future research directions in this area.

Uploaded by

moraes.lsantos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Artificial Intelligence Review (2024) 57:168

https://doi.org/10.1007/s10462-024-10741-2

Rotation invariance and equivariance in 3D deep learning:


a survey

Jiajun Fei1 · Zhidong Deng1

Accepted: 24 February 2024 / Published online: 7 June 2024


© The Author(s) 2024

Abstract
Deep neural networks (DNNs) in 3D scenes show a strong capability of extracting high-
level semantic features and significantly promote research in the 3D field. 3D shapes and
scenes often exhibit complicated transformation symmetries, where rotation is a challeng-
ing and necessary subject. To this end, many rotation invariant and equivariant methods
have been proposed. In this survey, we systematically organize and comprehensively over-
view all methods. First, we rewrite the previous definition of rotation invariance and equiv-
ariance by classifying them into weak and strong categories. Second, we provide a uni-
fied theoretical framework to analyze these methods, especially weak rotation invariant and
equivariant ones that are seldom analyzed theoretically. We then divide existing methods
into two main categories, i.e., rotation invariant ones and rotation equivariant ones, which
are further subclassified in terms of manipulating input ways and basic equivariant block
structures, respectively. In each subcategory, their common essence is highlighted, a cou-
ple of representative methods are analyzed, and insightful comments on their pros and cons
are given. Furthermore, we deliver a general overview of relevant applications and datasets
for two popular tasks of 3D semantic understanding and molecule-related. Finally, we pro-
vide several open problems and future research directions based on challenges and difficul-
ties in ongoing research.

Keywords 3D deep learning · Rotation invariance · Rotation equivariance · Group


convolution · Irreducible representation

* Zhidong Deng
michael@mail.tsinghua.edu.cn
Jiajun Fei
feijj20@mails.tsinghua.edu.cn
1
State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science
and Technology, Institute for Artificial Intelligence at Tsinghua University (THUAI), Beijing
National Research Center for Information Science and Technology (BNRist), Tsinghua University,
Beijing 100084, China

13
Vol.:(0123456789)
168 Page 2 of 52 J. Fei, Z. Deng

1 Introduction

In recent years, DNNs have played a more and more important role in 3D analysis. DNNs
are capable of processing many types of 3D data, including multi-view images (Su et al.
2015; Qi et al. 2016; Yu et al. 2018), voxels (Maturana and Scherer 2015; Zhou and Tuzel
2018), point clouds (Qi et al. 2017a; Wang et al. 2019b; Fei et al. 2022), and particles
(Schütt et al. 2017; Thomas et al. 2018; Satorras et al. 2021b). They have outperformed
traditional methods and shown great generalizability in a sequence of tasks, like classifi-
cation (Su et al. 2015; Qi et al. 2017a; Wang et al. 2019b), segmentation (Landrieu and
Simonovsky 2018; Meng et al. 2019; Furuya et al. 2020), detection (Zhou and Tuzel 2018;
Shi et al. 2019; Wang et al. 2023b), property prediction (Schütt et al. 2017; Satorras et al.
2021b), and generation (Hoogeboom et al. 2022; Guan et al. 2023).
Nonetheless, significant gaps exist between experiments and applications, restricting
the actual deployment of DNNs. For example, most experiments are conducted under
ideal settings with little noise, known data distribution, and canonical poses, which can-
not be completely met in practical applications. Among them, canonical poses are widely
adopted in 3D research, where 3D data is first aligned manually and then processed by
DNNs. However, such a setting leads to two main problems. First, these models may have
severe performance drops when evaluated with non-aligned 3D data, as shown in previ-
ous works (Esteves et al. 2018a; Sun et al. 2019b; Zhao et al. 2022a). Zhao et al. (2020b)
explore the fragility of 3D DNNs and achieve an over 95% successful rate of black-box
adversarial attacks through slightly rotating the evaluation 3D data. Second, these DNNs
cannot be applied to solve tasks requiring the output consistency. For example, the atom-
ization energies of molecules are irrelevant to their absolute positions and orientations
(Blum and Reymond 2009; Rupp et al. 2012). If DNNs are trained with aligned mole-
cules, they inevitably learn the nonexistent relationship between absolute coordinates and
molecular properties and may overfit training data. These models are unreliable and use-
less as they cannot give the same prediction concerning arbitrarily-rotated inputs. There
have been many ways to address such problems. We summarize them as rotation invariant
and equivariant methods in this survey.
Rotation invariance has been investigated in traditional 3D descriptors. Before the
emergence of DNNs, most methods can only capture low-level geometric features based
on transformation invariance. FPFH (Rusu et al. 2009) combines coordinates and esti-
mated surface normals to define Darboux frames. Then it uses several angular variations
to represent the surface properties. SHOT (Tombari et al. 2010) designs unique and unam-
biguous local reference frames (LRFs) to construct robust and expressive 3D descriptors.
Drost et al. (2010) create a global description with point pair features (PPFs) composed of
relative distances and angles. They can effectively handle tasks like pose estimation and
registration. Recently, Horwitz and Hoshen (2023) revisit the importance of traditional
descriptors on 3D anomaly detection. DNNs can learn high-level semantic features and
accomplish complicated tasks, but they usually ignore the rotation invariance and equiv-
ariance, making them unreliable for real-world applications. Existing works deal with this
problem from different perspectives. T-Net (Qi et al. 2017a) directly regresses transfor-
mation matrices from raw point clouds to transform poses and features. ClusterNet (Chen
et al. 2019b) constructs k nearest neighbors (kNN) graphs and computes several invari-
ant distances and angles, which are fed into hierarchical networks for complicated down-
stream tasks. Tensor field networks (TFNs) (Thomas et al. 2018; Thomas 2019) are equiv-
ariant neural networks based on the irreducible representation of SO(3). They have a solid

13
Rotation invariance and equivariance in 3D deep learning: a… Page 3 of 52 168

mathematical foundation and perform well over various tasks, including shape classifica-
tion and RNA structure scoring.
Many distinctive approaches have been developed for rotation invariance and equiv-
ariance. However, a comprehensive review of these methods is absent, making it chal-
lenging to keep pace with the recent progress and select appropriate methods for specific
tasks. Therefore, we are motivated to write this survey and fill the gap. Our contribu-
tions can be summarized from three aspects. First, this survey systematically overviews
existing works related to rotation invariance and equivariance, which are further divided
into several subcategories based on their structures and mathematical foundations. Sec-
ond, we unify the notations of different methods, providing an intuitive perspective for
analysis and comparisons. Third, we point out some open problems and propose future
research directions based on them.
This paper is organized as shown in Fig. 1. In Sect. 2, we introduce the mathematical
background of rotation invariance and equivariance, including the definition, commonly-
used rotation groups, and evaluation metrics. Rotation invariant and equivariant methods
are comprehensively overviewed and discussed, respectively, in Sect. 3 and Sect. 4. The
applications and datasets are also inspected in Sect. 5. In Sect. 6, we point out several
future research directions based on unsolved problems. Notations are listed in Table 1 for
better readability.

2 Background

This section introduces the background knowledge required to understand rotation invari-
ance and equivariance. The basic concepts of group theory are beneficial for better com-
prehension. Readers may refer to other textbooks for more details, including Group Theory
in Physics: An Introduction (Cornwell 1997) and Algebra (Artin 2013).
Invariance and equivariance have been formulated in much related work (Cohen and
Welling 2016; Thomas et al. 2018; Cohen et al. 2018a, 2019a; Thomas 2019). However,

Fig. 1  Overview of our survey. After the mathematical background is stated, rotation invariant and equiv-
ariant methods are introduced, respectively. Then we give a comprehensive overview of applications and
datasets and point out future directions based on open problems. Best viewed in color

13
168 Page 4 of 52 J. Fei, Z. Deng

Table 1  Notations adopted in Notation Description


this survey
ℝ≥0 The set of nonnegative real numbers
ℝn , ℂ n The set of n-dimensional (nD) real/complex vectors
O(n), SO(n) nD orthogonal group, nD rotation group
E(n), SE(n) nD Euclidean group, nD special Euclidean group
S n , Bn nD sphere, nD ball
X, Y Input space, output space
w, w, W Scalar, vector, matrix/tensor
G, g, 𝜇 Group, group element, Haar measure
g⋅x Group action g on x
( )
R(g), N xi Rotation matrix of g, Neighborhood of xi
WT , WH Transpose, conjugate transpose
sgn, ⊗ Signum function, tensor product
‖w‖, ‖W‖, �G� Euclidean norm, Frobenius norm, Cardinality
ũ l , ũ lm Steerable vectors of degree l (l ≥ 0, −l ≤ m ≤ l)
Y l (g), Yml (g) Spherical harmonic of degree l (l ≥ 0, −l ≤ m ≤ l)
D l
(g), Dlmn (g) Wigner D-matrix of degree l (l ≥ 0, −l ≤ m, n ≤ l)

their definition cannot cover some methods in this survey. Thus, we elaborately make a
broad definition to include them. The definition of both strong/weak invariance and equiv-
ariance can be seen in Definition 1. Compared with the previous definition, we introduce
weak invariance and equivariance through the G-variant error so as to cover methods not
satisfying Eq. 1. It should be noted that the determination of C as an exact value is unnec-
essary since any function is C-weakly equivariant if C is large enough (+∞). So C is gen-
erally omitted in this survey. If a method is weakly equivariant, it means that its G-variant
error is relatively small or reduced after appropriate training.

Definition 1 Suppose that G acts on X, Y , and f ∶ X → Y, d ∶ Y × Y → ℝ≥0 is a metric


on Y .
f is strongly equivariant with respect to G, if
f (g ⋅ x) = g ⋅ f (x), ∀x ∈ X, g ∈ G. (1)
Meanwhile, f is C-weakly equivariant with respect to G, if

∫X ∫G (2)
d(f (g ⋅ x), g ⋅ f (x))d𝜇(g)dx < C.

Specifically, if the group action of G on Y is trivial, i.e., ∀g ∈ G, ∀y ∈ Y, g ⋅ y = y, then f is


C-weakly/strongly invariant with respect to G. For discrete X or G, the integration on the

Table 2  The differences among Group Rotation Reflection Translation


SO(3), O(3), SE(3), and E(3)
SO(3) ✓ × ×
O(3) ✓ ✓ ×
SE(3) ✓ × ✓
E(3) ✓ ✓ ✓

13
Rotation invariance and equivariance in 3D deep learning: a… Page 5 of 52 168

Fig. 2  Milestones of rotation invariant methods. Best viewed in color

left side of Eq. 2 is substituted with summation. The integral is named the G-variant error,
denoted by E(f ).

SO(3), O(3), SE(3), E(3), and their proper subgroups are the commonly-used groups
that describe 3D rotation, reflection, and translation. Their differences are listed in Table 2.
Unless otherwise specified, we focus on rotation in the 3D Euclidean space, and G is a
subgroup of SO(3).
Rotation invariant and equivariant methods require specific evaluation metrics to
reflect the performances on certain tasks and the invariance/equivariance. Let us take a
{( )}N
supervised learning task with N training samples xi , yi i=1 as an example. f ∶ X → Y
is the deep model and L ∶ Y × Y → ℝ is the evaluation function. If there is no require-
∑ � � � �
ment on equivariance, the metric is computed as L = i L f xi , yi . However, if equiv-
( ( ) )
ariance is considered, the model f should consider L f g ⋅ xi , g ⋅ yi for all g ∈ G instead
( ( ) )
of only L f xi , yi . Accordingly, the metric LG is given as
∑ ( ( ) )
∫G (3)
LG = L f g ⋅ xi , g ⋅ yi d𝜇(g).
i

If f is strongly equivariant and L is strongly invariant, then LG degenerates into L. As the


integration is computationally inefficient, most previous works approximate the metric
with randomly-rotated samples.

3 Rotation invariant methods

Invariance is a particular and straightforward case of equivariance. Rotation invariant


methods aim to produce the same or close results for inputs with different poses. We
will show the basic essence of these methods and discuss their advantages and draw-
backs. Several milestone methods are shown in Fig. 2.

13
168 Page 6 of 52 J. Fei, Z. Deng

3.1 Data augmentation methods

Data augmentation methods only make changes to the loss function instead of any model
structure. They use samplings to estimate the integration in Eq. 3. Thus, the loss LG is con-
structed as
∑ ( ( ) )
LG = L f ĝ ⋅ xi , yi ,
i,̂g
(4)

where ĝ is sampled from G.


Many methods use data augmentation, and only some representative ones are listed here.
Kajita et al. (2017) exploit rotated replicas to increase the classification accuracy. Zhuang et al.
(2019); Zhu et al. (2020) leverage a Rubik’s cube recovery task with permutation and random
rotation to learn invariant features from medical images. Choy et al. (2019); Bai et al. (2020)
observe that fully convolutional neural networks (CNNs) can gain rotation invariance through
data augmentation. Zhou et al. (2022a) utilize random rotations to learn invariant representa-
tions for point cloud generation. Bergmann and Sattlegger (2023) apply rotation augmentation
on anomaly-free training samples for 3D anomaly detection.
Although data augmentation methods can enhance rotation robustness (Kajita et al. 2017;
Choy et al. 2019; Bai et al. 2020), they have severe limitations. Data augmentation generally
introduces a heavy training burden. For example, Kajita et al. (2017) use 30 times as much
rotated data to progress significantly on rotation invariant descriptors. Besides, data augmenta-
tion methods cannot guarantee their invariance on arbitrary rotations, because Eq. 4 cannot
minimize the loss on unseen rotations. Practically, data augmentation is often integrated into
other rotation invariant methods as an auxiliary component.

3.2 Multi‑view methods

Unlike data augmentation methods, multi-view methods attain rotation invariance by modify-
ing the model instead of the loss function. In multi-view methods, the model f ∶ X → Y is
built as
∑ ( )
f (x) = wj fb ĝ j ⋅ x ,

(5)
ĝ j ∈G

where G ̂ is a finite subset of G, fb ∶ X → Y is the base model, and wj > 0, ∑ wj = 1. The


j
metric d is generally convex, so f has a lower G-variant error than fb as Eq. 6 shows, mean-
ing f is more invariant. A simple yet effective approach is choosing G ̂ as a finite subgroup
of G and wj = | ̂1 | , then f is strongly invariant with respect to G
̂.
| G|
| |

13
Rotation invariance and equivariance in 3D deep learning: a… Page 7 of 52 168

( )
∑ ( ) ∑ ( )
�X �G
E(f ) = d wj fb ĝ j g ⋅ x , wj fb ĝ j ⋅ x d𝜇(g)dx
j j


∑ ( ( ) ( ))
�X �G
wj d fb ĝ j g ⋅ x , fb ĝ j ⋅ x d𝜇(g)dx
j (6)
∑ ( ( ) )
�X �G
= wj d fb ĝ j ĝg−1
j ⋅ x , f b (x) d𝜇(g)dx
j
( ) ( )
�X �G
= d fb (g ⋅ x), fb (x) d𝜇(g)dx = E fb .

As CNNs become a powerful tool for images (Krizhevsky et al. 2012; Simonyan and Zis-
serman 2014; Szegedy et al. 2015; He et al. 2016), researchers exploit multi-view images
to extract features from 3D shapes. Most multi-view methods take images as input, while
some later methods also process point clouds and voxels. Although these methods are
designed as 3D feature extractors firstly, they can improve rotation invariance and are
chosen as baselines by related work (Esteves et al. 2018a; Rao et al. 2019; Zhang et al.
2019a) (Fig. 3). MVCNN (Su et al. 2015) is a pioneering method showing that a fixed set
of rendered views is highly informative for 3D shape recognition. VoxNet (Maturana and
Scherer 2015) pools multi-view voxel features and achieves 2D rotation invariance around
the z-axis. Qi et al. (2016) introduce multi-resolution filtering for multi-view CNNs and
improve the classification accuracy. Cao et al. (2017) propose spherical projections to col-
lect depth variations and contour information of different views for better performances.
Zhang et al. (2018) apply a PointNet-like (Qi et al. 2017a) method on multi-view 2.5D
point clouds to fuse information from all views. View-GCN++ (Wei et al. 2022) exploits
rotation robust view-sampling to deal with rotation sensitivity. Besides, some methods
replaces weighted average in Eq. 5 with pooling/fusion modules to enhance effectiveness
and efficiency (Wang et al. 2017; Roveri et al. 2018; Yu et al. 2018; Wei et al. 2020; Li
et al. 2020; Chen and Chen 2022). These modifications do not necessarily improve the
invariance, so we omit them here.
Most multi-view methods take images as the input, so they can handle 3D rotation
invariance using powerful 2D models (Su et al. 2015; Qi et al. 2016; Cao et al. 2017).
Nonetheless, they lead to a heavy computational burden, making training and inference
|̂|
inefficient. As Eq. 5 shows, the computational burden of f is at least |G | times than that of
| |
|̂|
fb. For instance, |G| is 12 or 80 in MVCNN (Su et al. 2015). In addition, most existing
| |
multi-view methods are weakly rotation invariant. Their base models fb are not strongly
rotation invariant, such as 2D CNNs (Su et al. 2015; Qi et al. 2016; Wei et al. 2022) and

Fig. 3  A pipeline of multi-view methods. The 3D input is first rendered/sampled into multi-view data, pro-
cessed by non-invariant DNNs, and finally pooled for downstream tasks

13
168 Page 8 of 52 J. Fei, Z. Deng

non-invariant 3D networks (Zhang et al. 2018). So the composite models f do not possess
strong invariance.

3.3 Ringlike and cylindrical methods

Under some circumstances, it is straightforward to identify a principal axis. Thus, the 3D


rotation invariance degenerates into the 2D one. These methods organize data in rings or
cylinders for further processing.
The principal axis is either selected from x, y, z axes or computed using specific algo-
rithms. DeepPano (Shi et al. 2015) takes z-axis as the principal axis and creates a pano-
ramic view through a cylinder projection. A max-pooling layer is appended for invariance.
Moon et al. (2018) extend 2D CNNs working on panoramic views to 3D CNNs work-
ing on cylindrical occupancy grids and get better performances. Cylindrical Transformer
Networks (Esteves et al. 2018b) transform raw coordinates to cylindrical ones using the
predicted z-axis. As the 3D convolutions acting on cylindrical coordinates are translation
invariant, the final representations are rotation invariant. Many methods take this pipeline
with slight modifications (Sun et al. 2019a; Ao et al. 2021; Fan et al. 2021; Xu et al. 2021c;
Li et al. 2022b; Zhao et al. 2022b; Ao et al. 2023a).
Ringlike and cylindrical methods are compelling in applications like place recognition
(Sun et al. 2019a; Li et al. 2022b) and registration (Ao et al. 2021; Zhao et al. 2022b).
Nevertheless, their application scope is limited. They can only handle problems where the
principal axes can be identified, or the inputs can fit into rings and cylinders.

3.4 Transformation methods

Transformation methods address rotation invariance through a transformation function


t ∶ X → Aut(X), where Aut(X) is the automorphism group of X . In transformation meth-
ods, the model f ∶ X → Y is given as
f (x) = fb (t(x) ⋅ x), (7)
where fb ∶ X → Y is the base model. If t satisfies the invariance condition, i.e.,
∀x ∈ X, ∀g ∈ G, t(x) = t(g ⋅ x)g, (8)
then f is strongly rotation invariant. However, t does not satisfy this condition in most
methods, so f is only weakly rotation invariant. These methods are usually designed for
coordinate inputs like point clouds.
Spatial Transformer Networks (STNs) (Jaderberg et al. 2015) are widely used for spatial
invariance in image processing. In the 3D domain, PointNet (Qi et al. 2017a) proposes
joint alignment networks, i.e., T-Net, for rotation robustness, as shown in Fig. 4. T-Net is a
mini-PointNet regressing the transformation matrix directly. To make the matrix R ∈ ℝ3×3
‖ ‖
orthogonal, a regularization term Lreg = ‖I − RRT ‖ is appended. There is no clear disparity
‖ ‖
between STNs and T-Nets in the 3D domain, so they are not distinguished in this survey.
T-Net is widely adopted with the spread of PointNet-like methods (Qi et al. 2017a, b; Wang
et al. 2019b). SHPR-Net (Chen et al. 2018b) employs two T-Nets to connect poses in the
original and canonical spaces. PVNet (You et al. 2018) applies the EdgeConv (Wang et al.
2019b) as the basic blocks of T-Net to better capture local information. Zhang et al. (2018)

13
Rotation invariance and equivariance in 3D deep learning: a… Page 9 of 52 168

Fig. 4  T-Net (Qi et al. 2017a) directly regresses rotation matrices from coordinates. B, N refer to the num-
ber of batches and points, respectively. The numbers behind MLP are internal layer sizes

put raw point clouds and multi-view features into T-Net to robustify the model. In addition,
many other methods also include T-Net in their models for the effectiveness and stability in
different downstream tasks (Joseph-Rivlin et al. 2019; Chen et al. 2019a; Liu et al. 2019c;
Zhang et al. 2020a; Yu et al. 2020b; Wang et al. 2021; Poiesi and Boscaini 2021; Hegde
and Gangisetty 2021; Liu et al. 2022c; Zhu et al. 2022a).
Besides rotation matrices, some methods utilize other rotation representations. IT-Net
(Yuan et al. 2018) simultaneously canonicalizes rotation and translation through the qua-
ternion representation. PCPNet (Guerrero et al. 2018) and SCT (Liu et al. 2022a) regress
unit quaternions for pose canonicalization and point cloud recognition, respectively. Poiesi
and Boscaini (2023) learn a quaternion transformation network to refine the estimated
LRF. RotPredictor (Fang et al. 2020) applies PointConv (Wu et al. 2019) to regress Euler
angles, and RTN (Deng et al. 2021b) predicts discrete Euler angles. C3DPO (Novotny et al.
2019) divides the shape into view-specific pose parameters and a view-invariant shape
basis. PaRot (Zhang et al. 2023a) also disentangles invariant features with equivariant
poses via the equivariance loss. Wang et al. (2022c) formulate the rotation invariant learn-
ing problem as the minimization of an energy function, solved with an iterative strategy.
Some methods are embedded in a self-supervised learning framework. Some works (Zhou
et al. 2022b; Mei et al. 2023) enforce the consistency of canonical poses with a rotation
equivariance loss. Sun et al. (2021) utilize Capsule Networks (Hinton et al. 2011) with the
canonicalization loss for object-centric reasoning. Kim et al. (2022) introduce a self-super-
vised learning framework to predict canonical axes of point clouds using the icosahedral
group. Currently, only a few methods are strongly rotation invariant. LGANet (Gu et al.
2021b) and ELGANet (Gu et al. 2022) exploit graph convolutional networks (GCNs) to
process rotation invariant distances and angles, where the outputs are orthogonalized into
rotation matrices. Katzir et al. (2022) employ equivariant networks to learn canonical poses
of point clouds. RIP-NeRF (Wang et al. 2023c) transforms raw coordinates into invariance
one for fine-grained editing. EIPs (Fei and Deng 2024) disentangle rotation invariance and
point cloud processing with efficient invariant poses.
Beneficial from their straightforward idea, transformation methods are extensively used
in many applications (Liu et al. 2019c; Guerrero et al. 2018; Zhu et al. 2022a). Notwith-
standing, the invariance condition is always ignored by some works, especially those using
T-Nets (Qi et al. 2017a; Joseph-Rivlin et al. 2019; Poiesi and Boscaini 2021). Thus, the
transformation functions have no contribution to the rotation invariance. Besides, some
methods cannot output proper rotation representations. For example, T-Net (Qi et al.
2017a) cannot guarantee proper output rotation matrices, even using the regularization
term. In this case, 3D shapes are inevitably distorted, and some structural information may
be lost. Moreover, heavy data augmentation is sometimes required for good performance.

13
168 Page 10 of 52 J. Fei, Z. Deng

Le (2021) shows that T-Net needs a large amount of data augmentation to learn a steady
transformation policy.

3.5 Invariant value methods

Invariant value methods achieve rotation invariance through constructing invariant val-
ues from coordinate inputs. Here, invariant values include distances, inner products, and
angles:
( )
‖ui ‖ (distance), ui ⋅ uj (inner product), ∠ u , u (angle), (9)
‖ ‖ i j
{ }
where ui ⊂ ℝ3 is a nonzero geometric vector set. Based on these invariant values, the
model f ∶ X → Y is generally set up as
( )
f (x) = fb fi (x) , (10)

where fi ∶ X → Z uses handcrafted rules to compute invariant values, and fb ∶ Z → {Y is}


the base model. Clearly, f is strongly rotation invariant. In the following discussions, xi
represents a point cloud. xij , nij , (j = 1, ⋯ , k) denote
( ) the positional and normal vectors of
xi ’s kNN, respectively. mi is the barycenter of N xi . We use several operators to simplify
the notation: normalize (N), orthogonalize (O), and orthonormalize (NO).
x
N(x) =
‖x‖
, O(x, y) = y − (y ⋅ N(x))N(x), NO(x, y) = N(O(x, y)). (11)

As fb is usually a deep point cloud model with slight modification, the handcrafted rules in
fi are the core of invariant value methods. We divide existing methods into several groups
according to the form of invariant values.

3.5.1 Local values

Many methods generate invariant values in the local neighborhoods. ClusterNet (Chen
et al. 2019b) introduces rigorously rotation invariant (RRI) mappings based on a kNN
graph as
( { } ) [ {( ) )}k
]
k ‖ ‖ (
RRI xi , xij j=1 = ‖ x ‖
‖ i‖ , x
‖ ij ‖ , ∠ x , x
i ij , 𝜙ij ,
‖ ‖ j=1

where 𝜙ij = min atan2 aijt , bijt |1 ≤ t ≤ k, t ≠ j, aijt ≥ 0 ,


{ ( ) }
(12)
( ( ) ( )) ( )
aijt = NO xi , xij × NO xi , xit ⋅ N xi ,
( ) ( )
bijt = NO xi , xij ⋅ NO xi , xit .

ClusterNet applies a hierarchical structure to aggregate features. Although all geometric


information is retained, it mainly considers global information, weakening its capability to
describe local structures. RIConv (Zhang et al. 2019b) addresses this issue by extracting
local rotation invariant features (RIFs, Fig. 5a) via relative distances and angles as

13
Rotation invariance and equivariance in 3D deep learning: a… Page 11 of 52 168

Fig. 5  Representative invariant values from a local values, b LRF-based values, c PPF-based values, and d
global values. The solid lines are invariant values or necessary components of invariant values

( ) [‖ ‖ ‖ (1) ‖ ( (0) (0) ) ( (1) )]


RIF xij = ‖d(0) ‖, ‖d ‖, ∠ d , d , ∠ d , −d (0)
, (13)
‖ ij ‖ ‖ ij ‖ ij i ij i

where d(0)
ij
= xij − xi , d(1)
ij
= xij − mi , d(0)
i
= mi − xi . It applies a multi-layer perceptron
(MLP) to generate final features. RIF has been widely adopted by many works (Chou et al.
2021; Zhang et al. 2022; Wang and Rosen 2023; Fan et al. 2023).

Table 3  Some representative local values


Method fi
[ {( ( ))}k ]
PR-invNet Yu et al. (2020a) ‖ x ‖, ‖ ‖ ‖ (0) ‖ ( ) (0)
‖ i‖ ‖xij ‖, ‖dij ‖, cos ∠ xi , xij , cos ∠ xi , dij
‖ ‖ ‖ ‖ j=1
[ ( ) | ( ) ( ) ( )]
3D-GFE Chou et al. (2021) RIF xij , |d(1)
|
⋅ xij |, ∠ d(1) , xij , ∠ xi , xij , ∠ d(0) , −xi
[ | i | i
( ) (
ij
)
RI-Framework Li et al. (2021c) ‖ (1) ‖ ‖ (0) ‖ ‖ (2) ‖ (1) (2)
‖dij ‖, ‖dij ‖, ‖dij ‖, cos ∠ dij , dij , cos ∠ dij , dij ,
(0) (1)
‖ ( ‖ ‖ ‖) ‖ ‖( ( ) ( ))]
cos ∠ d(0) ij
, d(2)
ij
, sin 21 sgn d(1) i
⋅ xij ∠ d(1)i
, xi × xij ,
[ ( ) ( )]
‖xi ‖, ‖ (0) ‖ ‖ (4) ‖ (0) (4)
d ‖, ‖d ‖, cos ∠ di , di , cos ∠ di , −xi (4)
‖ ‖ ‖ ‖ i (‖ ‖ i ‖ )
[ ( )
RIConv++ Zhang et al. (2022) ‖ (0) ‖ (0) (0) (0) (1)
‖dij ‖, ∠ dij , di,j+1 , 𝛼ij , 𝛼ij , sgn 𝛼ij − 𝛼ij 𝛼ij ,
(1) (0) (2)
‖ ‖ ( ) ]
𝛽ij(0) , 𝛽ij(1) , sgn 𝛽ij(1) − 𝛽ij(0) 𝛽ij(2)
[
A-RI encoding Qiu et al. (2022) ‖xi ‖, ‖ d(4)
‖ ‖ (4)
, d − d(0)
‖ ‖ (0) ‖ ‖ (2) ‖ ‖ (1) ‖ ‖ (0) ‖
, ‖di ‖, ‖dij ‖, ‖dij ‖, ‖dij ‖,
‖ ‖ ‖ i ‖ ‖ i i ‖
( ‖ ‖ )‖ ( ‖ ‖ )] ‖ ‖ ‖ ‖ ‖ ‖ ‖
sgn d(1) i
⋅ xij ∠ d(1) i
, xi × xij
[ ( ) ( )] [ ( )]
RPR-Net Fan et al. (2023) RIF xij , ∠ d(0) , d(1)
‖ ‖ ‖ (2) ‖ ‖ (3) ‖
, ‖d(0) ‖, ‖ d ‖ , ‖ d ‖, ∠ d i , d i
(2) (3)
ij ij ‖ i ‖ ‖ i ‖ ‖ i ‖
( )
xmax and xmin denote the farthest and closest points to xi in N xi , respectively. si is the farther intersection
between the neighborhood ball and the extension of xi . xi,j+1 is the adjacent neighbor of xij in a clockwise
fashion. LRA (Zhang et al. 2022) computes a local reference axis from the neighborhood.
( )
d(2)
ij
= xij − si , d(3)
ij
= xi,j+1 − xij , d(1)
i
= N xi × mi , d(2) i
= xmax − xi , d(3) i
= xmin − xi , d(4) i
= mi − si , 𝛼ij(0) =
( ( ) ) ( ( ) ) ( ( ) ( )) ( ( ) )
∠ LRA xij , −d(0)
ij
, 𝛼ij(1) = ∠ LRA xi , −d(0)
ij
, 𝛼ij(2) = ∠ LRA xij , LRA xi , 𝛽ij(0) = ∠ LRA xij , d(3)
ij
,
( ( ) ) ( ( ) ( ))
𝛽ij(1) = ∠ LRA xi,j+1 , d(3)
ij
, 𝛽ij(2) = ∠ LRA xi,j+1 , LRA xij

13
168 Page 12 of 52 J. Fei, Z. Deng

Later work mainly adds more reference points and invariant values to improve per-
formances. Some representative invariant values are collected in Table 3. Readers may
refer to the original papers for details.

3.5.2 LRF‑based values

LRF-based values are special cases ( ) of local values. Specially, if three orthogonal axes
e1 , e2 , e3 can be determined in N xi , then xij ⋅ e1 , xij ⋅ e2 , xij ⋅ e3 are relative coordinates in
this LRF. LRFs are adopted in many handcrafted 3D descriptors, like FPFH (Rusu et al.
2009), SHOT (Tombari et al. 2010), and RoPS (Guo et al. 2013). It should be noted that
methods only using principal component analysis (PCA) to define LRFs are discussed sep-
arately in the next section instead of this one. We divide these methods according to the
number of LRFs in each neighborhood.
Some methods define a unique LRF in each neighborhood. Usually, the normal vector is
selected as one axis, a normalized weighted average vector is selected as another, and their
cross product is chosen as the final axis. We summarize these methods in Table 4.
Besides, there are also methods with multiple LRFs in each neighborhood. A common
choice of LRF is the Darboux frame defined as
( ) ( ) ( ) ( )
ex = ni , ey xij = N d(0)
ij
× ex , ez xij = ex × ey xij , (14)

where ey and ez depend on not only xi but also xij . CRIN (Lou et al. 2023) proposes another
LRF by considering the original space basis. Some representative invariant values are
listed in Table 5.

Table 4  Different LRFs adopted by LRF-based values with one LRF


Method LRF

3DSmoothNet Gojcic et al. (2019), Poiesi and e1 ∼ PCA or e1 = ni


� � � � �2 � ��
Boscaini (2023), G3DOA (Zhao et al. 2022b) ∑ � (0) � 2 (0) (0)
e2 = N j r i − � d � d ⋅ e 1 O e , d
1 ij
� ij � ij

� ∑ �
Pujol-Miró et al. (2019) k
ez ∼ PCA, ex = NO ez , j=1 Iij xij
( ( ))
GFrames Melzi et al. (2019) ez = ni , ex = N ∇f xi
( ) ( )
AECNN (Fig. 5b) Zhang et al. (2020b) ez = N xi , ex = NO xi , mi
( )
LFNet Cao et al. (2021) ez = ni , ex = N xi × ez
( )
PaRI-Conv Chen and Cong (2022) ex = ni , ey = N d(0)i
( ) ( )
Sahin et al. (2022) ex = N xi , ey = N xi × d(0) i
( ) ( )
Li et al. (2023b) ex = N xi , ey = NO xi , ni
( )
Only two axes are listed and their cross product determines the final axisri is the radius of N xi . Iij is the
intensity of xij . f is a scalar function defined on the manifolds of the shapes, and ∇ refers to the intrinsic
gradient

13
Rotation invariance and equivariance in 3D deep learning: a… Page 13 of 52 168

Table 5  Some representative invariant values with multiple LRFs


Method fi

Lin et al. (2021b) ( ) [‖ ‖ ( ) ( )


PF xij = ‖d(0) ‖, e x ⋅ n , N dij ⋅ ni ,
(0)
‖ ij ‖ y ij ] ij
( ( ) )
atan2 ez xij ⋅ nij , ni ⋅ nij
[
ERINet Gu et al. (2021a) ( ) ‖ (1) ‖ ( ) ( (1) )
PF xij , ‖ ‖ ‖x ‖ (0)
‖xi ‖, ‖ ij ‖, ‖dij ‖, ∠ xi , xij , ∠ dij , −di
‖ ‖ ‖ ‖
( ) ( ]
) ( )
∠ d(0)
ij
, d (0)
i
, ∠ n i , n ij , ∠ xij , n ij

LGANet Gu et al. (2021b) ( ) [ ( ) ( ) ( )


LRL xij = ∠ ni , nij , ∠ d(0) ij
, −xi , ∠ d(0) ij
, ni ,
( ) ( ( )) ( ( ))
∠ d(0)ij
, nij , ∠ nij , ey xij , ∠ nij , ez xij ,
]
‖xi ‖, ‖ ‖ ‖ ‖
xij ‖, ‖d(0)
‖ ‖ ‖ ‖ ‖ ‖ ij ‖‖
[ ( ) ( ) ( ) ]
ELGANet Gu et al. (2022) LRL xij , ∠ d(0) , d(5) , ∠ d(0) , d(6)
‖ ‖ ‖ (6) ‖
, ‖d(5) , ‖dij ‖
ij ij ij ij ‖ ij ‖‖ ‖ ‖
[ ( ) ( ) ( )
LGR-Net Zhao et al. (2022a) ‖ (0) ‖ (0) (0)
‖dij ‖, ∠ dij , ni , ∠ dij , nij , ∠ ni , nij ,
‖( ‖ )
( ) ( ) ( ( ) ( ))
∠ ey xij , e�y xij , ∠ ez xij , e�z xij ,
( ( ) ( )) ( ( ) ( ))]
∠ ey xij , −e�z xij , ∠ −ez xij , e�y xij
[
IPD-Net Tabib et al. (2023) ‖ (0) ‖ (0) (0)
‖dij ‖, dij ⋅ nij , dij ⋅ ni , ni ⋅ nij
‖ ‖
( ) ( ) ( ) ( )]
ey xij ⋅ e′y xij , ez xij ⋅ e′z xij
{ }
x�ij = arg maxx∈N(xi ) min ‖ ‖ ‖xij − x‖ (5) (6)
‖ , dij = x�ij − xi , dij = xij − x�ij ,
‖xi − x‖, ‖‖ ‖
( ) ( ) ( ( ) ) ( ) ( ) ( )
e�x xij = nij , e�y xij = N d(0)
ij
× e�x xij , e�z xij = e�x xij × e�y xij

3.5.3 PPF‑based values

PPFs (Drost et al. 2010) are initially proposed in the 3D object recognition algorithm,
which describe the relative information between two points x1 , x2 as
( ) [ ( ) ( ) ( )]
PPF x1 , x2 = ‖ ‖
‖d12 ‖, ∠ n1 , d12 , ∠ n2 , d12 , ∠ n1 , n2 , (15)

where dij = xi − xj , as Fig. 5c shows. PPFs are strongly rotation invariant, making them
suitable for invariant feature extraction.
PPFNet (Deng et al. 2018b) concatenates PPFs with coordinates and normals to improve
the robustness of 3D point matching. PPF-FoldNet (Deng et al. 2018a) combines PPFNet
with FoldingNet (Yang et al. 2018) to learn invariant descriptors, using only PPFs as input
features. Bobkov et al. (2018) slightly modify and apply the PPFs to classification and
retrieval. GMCNet (Pan et al. 2021) combines RRI (Chen et al. 2019b) and PPFs for rig-
orous partial point cloud registration. Using hypergraphs, Triangle-Net (Xiao and Wachs
2021) extend PPFs to three points (triangles). PaRI-Conv (Chen and Cong 2022) augments
PPFs with two azimuth angles and uses them to synthesize pose-aware dynamic kernels.
PPFs have been widely employed in rotation invariant point cloud matching and registra-
tion (Zhao et al. 2021; Yu et al. 2023; Zhang et al. 2023c).

13
168 Page 14 of 52 J. Fei, Z. Deng

3.5.4 Global values

Some methods do not require local neighborhoods to evaluate invariant values. SRINet
(Sun et al. 2019b) defines point projection mapping (PPM, Fig. 5d) through projecting xi
on three axes a1 , a2 , a3 as
( ) [ ( ) ( ) ( ) ]
PPM xi = cos ∠ a1 , xi , cos ∠ a2 , xi , cos ∠ a3 , xi , ‖ ‖
‖xi ‖ , (16)

where a1 = arg maxx∈{xi } ‖x‖, a2 = arg minx∈{xi } ‖x‖, a3 = a1 × a2. Based on SRINet,
Tao et al. (2021) add attention modules, and SCT (Liu et al. 2022a) adds a quaternion
T-Net for better performances. Sun et al. (2023) apply SRINet on non-rigid point clouds.
Some works (Xu et al. 2021b; Qin et al. 2023a) employ the sorted Gram matrix as invariant
{ }N ( )
values. The Gram matrix for xi 1 is computed as xi ⋅ xj N×N , each row of which is then
sorted and fed into point-based networks for permutation and rotation invariance.

3.5.5 Others

In addition to the above invariant values, the other values that are hard to classify are
listed here. SchNet (Schütt et al. 2017, 2018) gains rotation invariance through intera-
tomic distances. SkeletonNet (Ke et al. 2017) uses angles and ratios between distances
as invariant features for human skeletons. Liu et al. (2018) leverage relative distances
on global point cloud registration. 3DTI-Net (Pan et al. 2019) utilizes translation invari-
ant graph filter kernel and employs the norms as invariant features. 3DMol-Net (Li et al.
2021a) extends it to molecular applications. RISA-Net (Fu et al. 2020) employs edge
lengths and dihedral angles on 3D retrieval tasks. RMGNet (Furuya et al. 2020) feeds
several handcrafted descriptors into GCNs for point cloud segmentation. GS-Net (Xu
et al. 2020) uses eigenvalue decomposition on local distance graphs and exploits these
eigenvalues as invariant features. SN-Graph (Zhang et al. 2021b) leverages 15 cosine
values, 7 distances, and 7 radii as invariant values. TinvNN (Zhang et al. 2021c) exer-
cises eigenvalue decomposition on the zero-centered distance matrices to get invariant
features. ComENet (Wang et al. 2022b) exploits several rotation angles for global com-
pleteness. DuEqNet (Wang et al. 2023b) builds equivariant networks through relative
distances for object detection. SGPCR (Salihu and Steinbach 2023) explores the rota-
tion invariant convolution between two spherical Gaussians for object registration and
retrieval. RadarGNN (Fent et al. 2023) employs rotation invariant bounding boxes and
representation for radar-based perception. GeoTransformer Qin et al. (2023b) further
applies sinusoidal embedding on distances and angles for robust registration.

3.5.6 Discussion

Unlike the methods above, invariant value methods are strongly rotation invariant, and
their superiority has been demonstrated with many experiments (Xu et al. 2021b; Chen
and Cong 2022; Sahin et al. 2022; Wang et al. 2023b). Nevertheless, there are still several
concerns.
Singularity Almost every method has singularities that make invariant values mean-
ingless, including coincident points (e.g., xi = mi ⇒ d(0)
i
= 0 leads to undefined angles in
RIConv (Zhang et al. 2019b)), collinear vectors (e.g., if cross products in Cao et al. (2021);
Chen and Cong (2022); Sahin et al. (2022) give zero output, then their LRFs are not

13
Rotation invariance and equivariance in 3D deep learning: a… Page 15 of 52 168

properly defined), and nonunique candidate values (e.g. if two or more points are satisfying
arg maxx∈{xi } ‖x‖, then a1 in SRINet (Sun et al. 2019b) is not determined).
Irreversibility For fi ∶ X → Z , if there exists fri ∶ Z → X satisfying
( )
∀x ∈ X, ∃ gx ∈ G, fri fi (x) = gx ⋅ x, (17)

then fi is reversible. Some irreversible invariant values may lose certain structural
information, harming downstream task performances (Zhang et al. 2019b; Sun et al.
2019b).
Discontinuity The base model fb is generally a continuous deep model. So if fi is
discontinuous at x0 , then the model f may also be discontinuous at x0 , making it hard to
train with gradient-based optimization algorithms. For example, fi in SRINet (Sun et al.
2019b) is discontinuous on point clouds whose two longest vectors are close, since it
needs them to define axes.
Reflection Distances, inner products, and angles are invariant to rotations and reflec-
tions. Thus, almost all methods without cross products cannot distinguish rotations from
reflections (Drost et al. 2010; Zhang et al. 2019b; Xu et al. 2021b).

3.6 PCA‑based methods
PCA-based methods construct the model similarly to transformation methods, while
the transformation function is unlearnable PCA alignment, as Algorithm 1 shows.
X is usually zero-centered to mitigate the influence of translations and 𝚺 is called
the covariance
( matrix.
) PCA alignment can guarantee the rotation invariance. For
XR = XR RRT = I , if
( ) ( )T
𝚺R = XTR XR = RT 𝚺R = RT V 𝚲 RT V ⇒ V R = RT V, (18)

Table 6  Different disambiguation rules adopted by PCA-based methods. k = 1, 2, 3 unless otherwise speci-
fied
Method fi
� ∑ � ��
DLAN Furuya and Ohbuchi (2016) sk = sgn vk ⋅ i xij − x(c) k = 1, 3,
i
s2 = s1 s3 det (V)
� � ��
GCANet Zhang et al. (2020c) ∑
sk = sgn vk ⋅ j=1 wj xij − xi
�∑ � � ���
Fan et al. (2020) GPA-Net Shan et al. (2023) sk = sgn j sgn vk ⋅ xij − xi
( )
Gandikota et al. (2021) sk = sgn U1k
� ∑ � ��
R-PointHop Kadam et al. (2022) S3I-PointHop Kadam et al. sk = sgn vk ⋅ j xij − x(m)
(2023) i
( )
LGR-Net Zhao et al. (2022a) sk = sgn vk ⋅ xmax

x(c)
i
is the center of the Spheres-Of-Interest. x(m)
i
is the median point. xmax is the farthest point from the cen-
� �
mi − �xij − xi �
‖ ‖ � �
troid. mi = maxj ‖xij − xi ‖, wj = ∑ � �
‖ ‖ � �
k mi − �xik − xi �

13
168 Page 16 of 52 J. Fei, Z. Deng

then ZR = XR V R = XV = Z. There are two conditions for Eq. 18. First,


the eigenvalues must be distinct, i.e., 𝜆1 > 𝜆2 > 𝜆3. As it is rare that two or three
eigenvalues are equal, almost all methods assume it to be true. Second, the signs of all col-
[ ]
umns of V must be identified uniquely. If V = v1 , v2 , v3 satisfies Eq. 19, then
[ ] ( [ ]T )
Vdiag(s) = s1 v1 , s2 v2 , s3 v3 s = s1 , s2 , s3 ∈ {−1, 1}3 also satisfies it. Some works
substitute PCA with eigenvalue decomposition or singular value decomposition (SVD), but
√ −1
there is no substantial difference. In SVD, another matrix U = XV 𝚲 ∈ ℝN×3 is intro-
duced. In this section, PCA-based methods are classified according to how the ambiguity
of signs is handled (Fig. 6).

Algorithm 1  PCA Alignment

Most methods disambiguate signs through handcrafted rules, which generally involve
dot products between vk and other vectors. If vk → −vk ⇒ sk → −sk , then sk vk remains the
same. Some representative rules are listed in Table 6.
Some methods consider combinations of signs instead of just choosing one. Xiao et al.
(2020) fuse all combinations through a self-attention module. OrthographicNet (Kasaei
2021) transforms raw points into canonical poses and generates several projection views
for 3D object recognition. MolNet-3D (Liu et al. 2022c) average the results from 4 poses to
predict the molecular properties. Puny et al. (2022) convert the group averaging operation
to the subset averaging one with frames, where 4 and 8 frames are exploited for SO(3) and
O(3), respectively. Li et al. (2023a) apply this approach on 3D planar reflective symmetry
detection.

Fig. 6  A pipeline of PCA-based methods. Several pose candidates are first generated from the 3D input,
then they are either disambiguated using handcrafted rules/pose selectors or fused together

13
Table 7  Comparisons of different rotation invariant methods
Method Data format Invariance Limitation

Data augmentation methods No restriction Weak Heavy training burden


Multi-view methods Images, point clouds Weak Heavy computational burden
Ringlike and Cylindrical Methods Images, voxels, point clouds Strong Principal axes requirement
Rotation invariance and equivariance in 3D deep learning: a…

Transformation methods Point clouds Weak Improper rotation representation Data augmentation requirement
Invariant value methods Point clouds, meshes Strong Singularity Irreversibility Discontinuity Reflection
PCA-based methods Point clouds, meshes Strong Singularity Discontinuity Heavy computational burden Numerical instability
Page 17 of 52 168

13
168 Page 18 of 52 J. Fei, Z. Deng

Some works utilize pose selectors to make one pose from multiple candidates. PR-
invNet (Yu et al. 2020a) augments 8 poses with discrete rotation groups and utilizes the
pose selector to choose the final pose. Li et al. (2021b) investigate the inherent ambiguity
of PCA alignment. They argue that the order of ex , ey , ez is also ambiguous and the total
ambiguities is 4(sign) × 6(order) = 24. All poses are fused through a pose selector to cre-
ate an optimal one. Besides coordinates, some works apply PCA on network weights (Xie
et al. 2023) and the convex hull (Pop et al. 2023).
PCA-based methods are effective with intrinsic strong rotation invariance. Furthermore,
they are always combined with invariant value methods for better performances (Yu et al.
2020a; Zhao et al. 2022a; Chen and Cong 2022). However, sign disambiguation may bring
out problems like singularity and discontinuity in Sect. 3.5.6 (Zhang et al. 2020c; Fan et al.
2020; Gandikota et al. 2021), while considering all combinations would increase the com-
putational burden (Xiao et al. 2020; Kasaei 2021). Besides, PCA-based methods are fragile
to inputs with close eigenvalues since their eigenvectors are numerically unstable, which is
an inherent problem of eigenvalue decomposition.

3.7 Summary

In a word, different methods use distinctive ways to obtain rotation invariance. Most rota-
tion invariant methods are applied in 3D general understanding. We compare their differ-
ences in Table 7. Considering this, we summarize several characteristics of existing rota-
tion invariant methods.

• Data augmentation is always integrated with other methods, especially weakly rotation
invariant ones (Fang et al. 2020; Deng et al. 2021b; Le 2021), to improve their invari-
ance.
• Multi-view methods only work with images and do not have advantages on coordinate
inputs, since they are weakly invariant and usually introduce heavy computational bur-
dens (Su et al. 2015; Qi et al. 2016; Zhang et al. 2018).
• Ringlike and cylindrical methods are the best choices in tasks like place recognition
(Sun et al. 2019a; Li et al. 2022b), as achieving 2D invariance is simpler than 3D.

Fig. 7  Milestones of rotation equivariant methods. Best viewed in color

13
Rotation invariance and equivariance in 3D deep learning: a… Page 19 of 52 168

• Weakly rotation invariant transformation methods are less recommended. They can
be replaced by PCA-based methods that have strong invariance and excellent perfor-
mances.
• Until now, strong invariance is only available by applying invariant value methods and
PCA-based methods on coordinate inputs like point clouds and meshes.

4 Rotation equivariant methods

Most of the rotation equivariant methods are equivariant networks on rotation groups. There
are already surveys on geometrically equivariant graph neural networks (Han et al. 2022;
Zhang et al. 2023b), categorizing them according to the way of message passing and aggrega-
tion. We devise a slightly different taxonomy to cover more related methods. Some milestone
methods are listed in Fig. 7.

4.1 G‑CNNs

Group equivariant convolutional neural networks (G-CNNs) are first proposed to address 2D
rotations in images (Cohen and Welling 2016). Moreover, they can be extended to 3D rota-
tions directly. The group convolution for 𝜓, f ∶ X → ℝ is defined as
[ ] [ ]
∫X (19)
𝜓 ⋆ f (g) = Lg 𝜓 (x)f (x)dx,
[ ] ( )
where Lg 𝜓 (x) = 𝜓 g−1 ⋅ x . The output signal is always defined on the rotation group, so
X = G in all convolutional[ layers] except[ the]first one. Group convolutions are strongly rota-
tion equivariant, i.e., 𝜓 ⋆ Lg f = Lg 𝜓 ⋆ f .
It is difficult to evaluate the integration directly, so many methods investigate group con-
volutions with finite groups. CubeNet (Worrall and Brostow 2018) focuses on convolutions
on finite groups and reduces rotation equivariance to permutation equivariance. The group
convolution for 𝜓, f ∶ G ̂ → ℝ satisfies
[ ]( ) [ ]( ) [ ]( )
𝜓 ⋆ f ĝ j = Lĝ i 𝜓 ⋆ f ĝ k(i,j) = 𝜓 ⋆ Lĝ i f ĝ k(i,j) , (20)

Fig. 8  The Cayley table of the


tetrahedral group that satisfies
ĝ k(i,j) = ĝ i ĝ j . Each different color
represents a different rotation
element. With the help of the
Cayley table, it is straightforward
to transform discrete rotation
equivariance into permutation
equivariance. Best viewed in
color

13
168 Page 20 of 52 J. Fei, Z. Deng

where G ̂ is a finite rotation group and ĝ k(i,j) = ĝ i ĝ j . Therefore, rotation f → Lĝ f is equiv-
i
alent to permutation j → k(i, j) in the group convolution, as Fig. 8 shows. Esteves et al.
(2019b) put multi-view features on vertices of the icosahedron and introduce localized
filters in discrete G-CNNs for efficiency. EPN (Chen et al. 2021) combines point convo-
lutions with group convolutions for SE(3) equivariance, and has been applied on object
detection (Yu et al. 2022) and place recognition (Lin et al. 2022a, 2023a). G-CNNs are
employed in many tasks, like medical image analysis (Winkels and Cohen 2018, 2019;
Andrearczyk and Depeursinge 2018), point cloud segmentation (Meng et al. 2019; Zhu
et al. 2023), pose estimation (Li et al. 2021d), and registration (Wang et al. 2022a, 2023a;
Xu et al. 2023a).
Some methods utilize Lie groups to construct equivariant models. LieConv (Finzi et al.
2020) lifts raw inputs x ∈ X to group elements g ∈ G and orbits q ∈ X∕G that g ⋅ oq = x,
where oq is the origin of q. Thus, the convolution is defined as
[ ] ( ) ( )
∫G ∫X∕G
𝜓 ⋆ f (g, q) = 𝜓 g−1 g� , q, q� f g� , q� dq� d𝜇(g)� , (21)

where 𝜓 ∶ G × X∕G × X∕G → ℝ, f ∶ G × X∕G → ℝ. LieTransformer (Hutchinson et al.


2021) adds attention mechanisms to LieConv. After lifting, it computes content attention
and location attention, both of which are normalized for feature transformation.
G-CNNs are effective tools for handling equivariance for voxels and point clouds (Worrall
and Brostow 2018; Finzi et al. 2020; Chen et al. 2021). Nonetheless, it is difficult to balance
the computational burden and the approximation error when using sampling to approximate
the integration.
( Moreover,
) a finite subgroup(of SO(3) is) one of the following groups: the cyclic
group Ck ||Ck || = k , the dihedral group Dk ||Dk || = 2k , the tetrahedral group T(|T| = 12), the
octahedral group O(|O| = 24), and the icosahedral group I(|I| = 60) (Artin 2013). Ck , Dk can
be large enough but are unsuitable for arbitrary 3D rotations, while V, O, I are applicable but
cannot be as large as possible. Therefore, it is hard for methods that depend on finite sub-
groups to extend to arbitrary rotations, like CubeNet (Worrall and Brostow 2018).

Fig. 9  A pipeline of Spherical CNNs. Most works (Cohen et al. 2018a; Esteves et al. 2018a) employ ten-
sor products to compute spherical/SO(3) convolutions in the spectral domain, while others directly apply
spherical convolutions in the spatial domain

13
Rotation invariance and equivariance in 3D deep learning: a… Page 21 of 52 168

4.2 Spherical CNNs

Spherical CNNs are special cases of G-CNNs, where the inputs are spherical and SO(3) sig-
nals. In this survey, existing spherical CNNs are divided into three categories, i.e., Cohen et al.
(2018a), Esteves et al. (2018a), and the others (Fig. 9).

4.2.1 Cohen et al. (2018a)

Cohen et al. (2018a) directly employ group convolutions in Eq. 19, where X is either S2 or
SO(3). They use the generalized Fourier transform (GFT) to convert convolutions into matrix
multiplications. GFT and its inverse are computed as
∞ l
∑ ∑
∫S 2
f̃ml = f (x)Yml (x)dx, f (x) = f̃ml Yml (x), (22)
l=0 m=−l

∞ l
∑ ∑
∫SO(3)
f̃mn
l
= f (g)Dlmn (g)d𝜇(g), f (g) = f̃mn
l
Dlmn (g), (23)
l=0 m,n=−l

where l ≥ 0, −l ≤ m, n ≤ l . It can be proved that 𝝍


l l ( )H l
� ⋆ f = f̃ 𝝍̃ l , where f̃ , 𝝍̃ l ∈ ℂ2l+1
l l
for spherical signals, f̃ , 𝝍̃ l ∈ ℂ(2l+1)×(2l+1) for SO(3) signals, and 𝝍 ⋆ f ∈ ℂ(2l+1)×(2l+1).

The computation can be further accelerated with the generalized fast Fourier transform.
Clebsch-Gordan Nets (Kondor et al. 2018) exploit the tensor product nonlinearity to avoid
repeated transform, thus improving the efficiency. The tensor product between two steer-
able vectors ũ l1 ∈ ℂ2l1 +1 , ṽ l2 ∈ ℂ2l2 +1 is defined as
( l )l ∑ l,m
ũ 1 ⊗ ṽ l2 m = Cl ,m ,l ,m ũ lm1 ṽ lm2 , (24)
1 1 2 2 1 2
m1 ,m2

where is the Clebsch-Gordan coefficient, ||l1 − l2 || ≤ l ≤ l1 + l2 , −l ≤ m ≤ l . Ten-


Cll,m,m ,l ,m
1 1 2 2
( )l ( )l
sor product is strongly rotation equivariant, i.e., Dl1 (g)ũ l1 ⊗ Dl2 (g)̃vl2 = Dl (g) ũ l1 ⊗ ṽ l2 .
Many methods are based on spherical convolutions (Cohen et al. 2018a). a3
SCNN (Liu et al. 2019a) proposes the alt-az anisotropic spherical convolution (a3
[SConv),] whose [ outputs
] are spherical but 2not SO(3) signals. a SConv (⋆1) is defined as
3

𝜓 ⋆1 f (x) = 𝜓 ⋆ f (𝜁 (x, 0)), where 𝜁 ∶ S × [0, 2𝜋) → SO(3). As 𝜁 (x, 0) cannot represent
all SO(3) elements, a3SConv is only equivariant to specific rotations. Esteves et al. (2020b)
introduce spin weights and propose the spin-weighted spherical CNN. PRIN (You et al.
2020, 2021) propose spherical
[ voxel
] convolution
[ ] (SVC) for signals on the unit ball B3.
SVC (⋆2) is defined as 𝜓 ⋆2 f (x) = 𝜓 ⋆ f (𝜄(x)), where 𝜄 ∶ B3 → SO(3). SPRIN { }(You
et al. 2021) abandons the dense grids in PRIN by directly converting point clouds xi into
∑ � �
a distribution function f (x) = N1 i 𝛿 x − xi , where 𝛿 is the delta function. Then SVC can
be efficiently approximated as an unbiased estimation. Chen et al. (2023) combines spheri-
cal CNNs with Capsule Networks (Hinton et al. 2011) for unknown pose recognition.
Most methods use the ray casting to generate spherical signals from 3D shapes. How-
ever, other methods are also applicable. Yang et al. (2019); Yang and Chakraborty (2020)
generate spherical signals by collecting responses from point clouds. Spherical-GMM
(Zhang 2021) represents point clouds with Gaussian mixture models. Besides classi-
fication and segmentation, spherical CNNs are widely used in many tasks, including

13
168 Page 22 of 52 J. Fei, Z. Deng

omnidirectional localization (Zhang et al. 2021a), place recognition (Yin et al. 2020, 2021,
2022), and self-supervised representation learning (Spezialetti et al. 2019; Marcon et al.
2021; Lohit and Trivedi 2020; Spezialetti et al. 2020).

4.2.2 Esteves et al. (2018a)

Esteves et al. (2018a, 2020a) propose another spherical convolution only processing spher-
ical signals. The spherical convolution for 𝜓, f ∶ S2 → ℝ is defined as
[ ]
∫G (25)
𝜓 ∗ f (x) = Lg 𝜓(x)Lg−1 f (𝜂)d𝜇(g),

where 𝜂 is the north pole. Such spherical convolutions are strongly rotation equivariant,
[ ] [ ]
i.e., 𝜓 ∗ Lg f = Lg 𝜓 ∗ f , which can be converted to multiplications with GFT as
l

𝜓� ∗ f m = 2𝜋 2l+1 4𝜋
𝜓̃ 0l f̃ml . As only 𝜓̃ 0l is involved, the only useful part is the zonal compo-
nent of the filter 𝜓 .
Esteves et al. (2019a) utilize pre-trained spherical CNNs as supervision and learn equiv-
ariant representations for 2D images. Mukhaimar et al. (2022) apply them on concentric
spherical voxels for robust point cloud classification. Esteves et al. (2023) scale up spheri-
cal CNNs and achieve outstanding performances on molecular benchmarks and weather
forecasting tasks.

4.2.3 Others

Some spherical CNNs keep GFT and part of spherical convolutions. Zhang et al. (2019a)
replace the SO(3) convolutional layers with PointNet-like (Qi et al. 2017a) networks.
Almakady et al. (2020) use GFT to decompose the spherical signals, then exploit the
norms of individual components as invariant features for volumetric texture classification.
Lin et al. (2021b) combine these norms with other invariant features to boost the classifica-
tion performance.
Some spherical CNNs handle convolutions in the spatial domain. SFCNN (Rao et al.
2019) apply symmetric convolutions to each point and its neighbors on spherical lattices.
Yang et al. (2020) propose the geodesic icosahedral pixelization to address the irregular-
ity problem. Fox et al. (2022) transform point clouds into concentric spherical signals and
append convolutions along the radial dimension. Shakerinava and Ravanbakhsh (2021)
investigate the pixelizations of platonic solids for spheres and introduce equivariant maps
on them. Xu et al. (2022) exploit global–local attention-based convolutions for spherical
data.

4.2.4 Discussion

Spherical CNNs are effective for spherical signals. They have a solid mathematical foun-
dation and nice properties on equivariance. Notwithstanding, preprocessing is sometimes
problematic. The ray casting technique is commonly adopted to convert 3D shapes into
spherical signals (Cohen et al. 2018a; Esteves et al. 2018a). However, Esteves et al. (2018a)
argue that it is only suitable for star-shaped objects, from whose interior point the whole

13
Rotation invariance and equivariance in 3D deep learning: a… Page 23 of 52 168

Fig. 10  TFN (Thomas et al. 2018; Thomas 2019) layer. Each point xi is associated with a tensor field V i .
The output tensor field V ′i is aggregated from the tensor product between the filter features F(xi − xj ) and
the input tensor field V j . Some superscripts and subscripts are omitted for simplicity

boundary is visible. Besides, projection on spheres would unavoidably distort shapes, and
finer grids lead to less error but a heavier computational burden (Cohen et al. 2018a; Este-
ves et al. 2018a).

4.3 Irreducible representation methods

Irreducible representation methods utilize irreducible representations of SO(3), i.e.,


Wigner-D matrices Dl , l = 0, 1, ⋯, to achieve rotation equivariance. A degree-l steerable
feature ũ l would transform into Dl (g)ũ l under g ∈SO(3). In these methods, the degree-l
filter Fl ∶ ℝ3 → ℂ2l+1 is constructed as
� �
x ≠ 0,
x
Fl (x) = 𝜑l (‖x‖)Y l (26)
‖x‖

where 𝜑l ∶ ℝ≥0 → ℝ and Y l is the spherical harmonic. To guarantee the continuity, Fl (0)
is determined by limx→0 F l (x), which is nonzero only when l = 0. Fl is strongly rotation
equivariant, i.e., Fl (R(g)x) = Dl (g)Fl (x).
Irreducible representation methods are mostly applied to coordinate inputs like point
clouds. Tensor field networks (TFNs) (Thomas et al. 2018; Thomas 2019) are the pioneer-
ing methods using irreducible representations. All inputs and outputs of the TFN layer are
̃ l ∈ ℝN×Cl ×(2 l+1), where N is the number of points, Cl is the feature dimen-
tensor fields V
sion, and l = 0, ⋯ , L is the rotation degree. They exploit TFN filters to generate steerable
features from coordinates. Then the tensor product between these features and input tensor
fields is computed as the output tensor fields, as shown in Fig. 10. TFNs and Clebsch-
Gordan Nets (Kondor et al. 2018) have many similarities, including steerable features and
tensor products. However, TFNs bind steerable features with points, while Clebsch-Gordan
Nets exploit steerable features to describe spherical signals. N-body Networks (Kondor
2018), designed for many body physical systems, are also based on the irreducible repre-
sentation of SO(3). Cormorant (Anderson et al. 2019) modifies the nonlinearity in Cleb-
sch-Gordan Nets (Kondor et al. 2018) to avoid the blow-up of channels. SE(3)-Transformer
(Fuchs et al. 2020) decomposes the TFN layer into self-interaction and message-passing,
where attention is added to the second part. TF-Onet Chatzipantazis et al. (2023) also
uses equivariant attention modules for shape reconstruction. Poulenard and Guibas (2021)
propose a new nonlinearity for steerable features to improve the expressivity and reduce
the computational burden. TFNs are leveraged in many applications, including 3D shape
analysis (Poulenard et al. 2019), protein structure prediction (Fuchs et al. 2021), molecular

13
168 Page 24 of 52 J. Fei, Z. Deng

dynamics simulation (Batzner et al. 2022), and self-supervised canonicalization (Sajnani


et al. 2022).
Besides point clouds, irreducible representation methods are also applied to voxels. 3D
Steerable CNNs (Weiler et al. 2018) reduce rotation equivariant linear maps between irre-
ducible features into convolutions with steerable kernels W ll ∶ ℝ3 → ℝ(2 l+1)×(2 l +1) that
� �

satisfy

(27)
� � �
W ll (R(g)x) = Dl (g)W ll (x)Dl (g)−1 .

Eq. 27 can be solved analytically with the solution as a TFN-type matrix function.
3D Steerable CNNs are employed in some applications, including 3D texture analysis
(Andrearczyk et al. 2019), partial point cloud classification (Xu et al. 2023b), and mul-
tiphase flow demonstration (Siddani et al. 2021; Lin et al. 2021a). PDO-s3DCNNs (Shen
et al. 2022) derive the general steerable 3D CNNs with partial differential operators.
Irreducible representation methods have intrinsic strong rotation equivariance. Nonethe-
less, the theory is so complex as to limit the audience (Weiler et al. 2018; Thomas et al.
2018; Thomas 2019). Besides, tensor products may increase the number of the rotation
degree and harm the efficiency Thomas et al. (2018); Thomas (2019); Kondor et al. (2018).

4.4 Equivariant value methods

Equivariant value methods are networks constructed by equivariant values, i.e., scalars and
vectors. They are similar to invariant value methods in Sect. 3.5. However, invariant values
are only primitive features, while equivariant values form the basic blocks of equivariant
networks.
EGNNs (Satorras et al. 2021b) add relative distances to graph convolutional layers.
Then the coordinate xi and feature f i are updated as
( )
‖ ‖2
mij = 𝜙e f i , f j , ‖xi − xj ‖ , aij ,
‖ ‖ (28)

⎛ ⎞
1 �� � � � �
xi + xi − xj 𝜙x mij → xi , 𝜙f ⎜f i , mij ⎟ → f i , (29)
N − 1 j≠i ⎜ ⎟
⎝ j∈N(xi ) ⎠

where aij is the edge information, 𝜙e , 𝜙x , 𝜙f are update functions for edges, coordinates,
and node features, respectively. Clearly, the coordinates are strongly rotation equivariant,
while the features are strongly rotation invariant. E-NFs (Satorras et al. 2021a) combine
EGNNs with continuous-time normalizing flows (Chen et al. 2018a) to construct equiv-
ariant generative models. EquiDock (Ganea et al. 2022) and EquiBind (Stärk et al. 2022)
apply graph matching networks (Li et al. 2019b) and EGNNs on rigid body protein-protein
docking and drug binding structure prediction, respectively. Some methods (Hoogeboom
et al. 2022; Schneuing et al. 2022; Igashov et al. 2022; Lin et al. 2022b; Guan et al. 2023)
incorporate diffusion models with EGNNs for molecule generation. SEGNNs (Brandstetter
et al. 2022) extend EGNNs with steerable features.
Vector Neurons (VNs) (Deng et al. 2021a) endow networks with equivariance by replac-
ing scalars with vectors. Take the linear layer as an example, v ∈ ℝC is transformed into

13
Rotation invariance and equivariance in 3D deep learning: a… Page 25 of 52 168

Fig. 11  The comparison between linear layers in typical networks (left) and VNs (right) (Deng et al.
2021a). Each solid line represents a weight value. As the vectors are transformed consistently, VNs can
achieve strong rotation equivariance

Wv + b ∈ ℝC in classic networks, and V ∈ ℝC×3 is transformed into WV ∈ ℝC ×3 in VNs,


� �

where W ∈ ℝC ×C , b ∈ ℝC (Fig. 11). Other layers are modified analogously. VN-Trans-


� �

former (Assaad et al. 2022) derives equivariant attention mechanisms to enhance effective-
ness and efficiency based on VNs. VNs are strongly rotation equivariant and have been
applied in object manipulation (Simeonov et al. 2022), molecule generation (Huang et al.
2022b), point cloud registration (Zhu et al. 2022b; Lin et al. 2023b; Ao et al. 2023b), point
cloud completion (Wu and Miao 2022), unsupervised point cloud segmentation (Lei et al.
2023), and point cloud canonicalization (Katzir et al. 2022; Kaba et al. 2023). Geomet-
ric vector perceptrons (GVPs) (Jing et al. 2021b) similarly operate on geometric vectors.
Jing et al. (2021a) apply GVPs on structural biology tasks and reach several state-of-the-art
results. PaiNN (Schütt et al. 2021) builds efficient equivariant layers to predict molecular
properties. SE(3)-DDM (Liu et al. 2022b) applies PaiNN on the coordinate denoising task.
TorchMD-NET (Thölke and Fabritiis 2022) designs attention-based update rules for fea-
tures of different types. Directed weight neural networks (Li et al. 2022a) generalize VNs
and GVPs with more operators, which can be integrated with existing GNN frameworks.
Chen et al. (2022) build graph implicit functions with equivariant layers to capture geomet-
ric details. Le et al. (2022b) exploit cross products to generate new vectors in the message
function.
Villar et al. (2021) utilize several theorems to construct equivariant functions on groups
including O(n) and SO(n). GMN (Huang et al. 2022a) constructs equivariant networks
similarly and proves their universal approximation. IsoGCNs (Horie et al. 2021) achieve
equivariance through operating rank-p tensors H p ∈ ℝ|V|×C×d . Using a similar approach,
p

Finkelshtein et al. (2022) define ascending and descending layers for geometric dimension
expansion and contraction, respectively. Suk et al. (2021, 2022) leverage equivariant neural
networks in computational fluid dynamics. EQGAT (Le et al. 2022a) processes coordinates
with attention mechanisms for better performances. Luo et al. (2022) extend message pass-
ing networks with learned orientations. DeepDFT (Jørgensen and Bhowmik 2022) employs
message passing networks on fast electron density estimation.
Compared to previous methods, equivariant value methods do not introduce approxima-
tion error and their theories are relatively simple (Satorras et al. 2021b; Deng et al. 2021a).
Albeit recently emerged, they have shown great potential in many applications (Deng et al.
2021a; Ganea et al. 2022; Stärk et al. 2022; Schütt et al. 2021).

13
168

13
Page 26 of 52

Table 8  Comparisons of different rotation equivariant methods


Method Data format Invariance Limitation

G-CNNs Voxels, point clouds, graphs Weak Approximation error of integration Problems of finite subgroups
Spherical CNNs Spherical signals Weak Approximation error of GFT Problems of preprocessing
Irreducible representation methods Voxels, point clouds, graphs Strong Complex theory Inefficient tensor products
Equivariant value methods Point clouds, graphs Strong No common weakness
J. Fei, Z. Deng
Rotation invariance and equivariance in 3D deep learning: a… Page 27 of 52 168

4.5 Others

Some equivariant networks use quaternions to represent 3D rotations. REQNN (Shen


et al. 2020) employs quaternions to revise classic layers into equivariant ones. Zhao et al.
(2020a) propose quaternion equivariant capsule networks to disentangle geometry from
poses. Quaternion CNNs (Jing et al. 2021c) utilize convolutions on quaternion arrays for
gait identification. Qin et al. (2022) present quaternion product units to address rotation
equivariance.
Some methods turn to gauges for rotation equivariance. Gauge equivariant CNNs
(Cohen et al. 2019a) propose gauge equivariant convolutions based on the gauge theory.
Haan et al. (2021) adapts the above structure to mesh inputs. Gauge equivariant trans-
former (He et al. 2021) adds attention mechanisms to gauge equivariant CNNs for better
performances.
Finzi et al. (2021) derive the equivariant condition like that in 3D Steerable CNNs
(Weiler et al. 2018) with Lie algebra representations. EqDDM (Azari and Erdogmus 2022)
leverage these constraints to build an equivariant deep dynamical model for motion predic-
tion. Melnyk et al. (2022) establish steerability constraints for spherical neurons to con-
struct equivariant layers.
Li et al. (2019a) take a similar approach to CubeNet (Worrall and Brostow 2018) but
without group convolutions, where invariance is achieved by eliminating the permutation.
XEdgeConv (Weihsbach et al. 2022) directly explores symmetric kernels for discrete rota-
tion equivariance. Park et al. (2022) design equivariant networks for domains where it is
hard to describe the transformation of inputs explicitly.

4.6 Summary

Rotation equivariant methods have a broader application range compared to rotation invari-
ant ones. The differences of various rotation equivariant methods are listed in Table 8. We
summarize several characteristics of existing rotation equivariant methods.

• The approximation error of G-CNNs (Finzi et al. 2020; Chen et al. 2021) and spheri-
cal CNNs (Cohen et al. 2018a; Esteves et al. 2018a) are inevitable, which can only be
reduced through fine discretization and cumbersome computation. Therefore, they are
less reliable than strongly rotation equivariant methods.
• Albeit strongly rotation equivariant, irreducible representation methods (Thomas et al.
2018; Weiler et al. 2018; Thomas 2019) have a complex theory, which poses great chal-
lenges for fresh users.
• Equivariant value methods (Satorras et al. 2021b; Deng et al. 2021a) achieve great bal-
ance between theoretical properties and experimental performances.

13
168 Page 28 of 52 J. Fei, Z. Deng

Table 9  Tasks and datasets in Task Dataset


general 3D understanding
Classification ModelNet Wu et al. (2015)
ShapeNetCore Chang et al. (2015)
ScanObjectNN Uy et al. (2019)
RGB-D Object Lai et al. (2011)
S3DIS Armeni et al. (2016)
ScanNet Dai et al. (2017)
Spherical MNIST Cohen et al. (2018a)
Spherical CIFAR-10 Yang et al. (2020)
RFAI Paulhac et al. (2009)
OASIS Fotenos et al. (2005)
Segmentation ShapeNetPart Yi et al. (2016)
PartNet Mo et al. (2019)
S3DIS Armeni et al. (2016)
ScanNet Dai et al. (2017)
Semantic3D Hackel et al. (2017)
2D-3D-S Armeni et al. (2017)
BraTS-2018 Menze et al. (2015)
Detection ScanNetV2 Dai et al. (2017)
SUN RGB-D Song et al. (2015)
KITTI Geiger et al. (2012)
nuScenes Caesar et al. (2020)
NLST Team (2011)
LIDC/IDRI McNitt-Gray et al. (2007)
Pose estimation ModelNet Wu et al. (2015)
ShapeNet Chang et al. (2015)
ObjectNet3D Xiang et al. (2016)
NOCS Wang et al. (2019a)
Human3.6M Ionescu et al. (2014)
MPI-INF-3DHP Mehta et al. (2017)
ICVL Tang et al. (2014)
NYU Tompson et al. (2014)
MSRA Sun et al. (2015)
Shape registration The Stanford 3D Scanning Repository
Curless and Levoy (1996)
7Scenes Shotton et al. (2013)
3DMatch Zeng et al. (2017)
Place recognition KITTI Geiger et al. (2012)
ETH Pomerleau et al. (2012)
NCLT Carlevaris-Bianco et al. (2016)
Oxford RobotCar Maddern et al. (2017)
MulRan Kim et al. (2020a)
KITTI-360 Liao et al. (2022)

13
Rotation invariance and equivariance in 3D deep learning: a… Page 29 of 52 168

Table 10  ModelNet40 (Wu et al. 2015) classification results of representative rotation invariant/equivariant
methods
Method Input Overall accuracy (%)
Type Size z/z SO(3)/SO(3) z/SO(3)

Rotation Iinvariant methods


MVCNN2, Su et al. (2015) image 80 × 2242 90.2 86.0 81.5
RIConv Zhang et al. (2019b) pc 1024 × 3 86.5 86.4 86.4
SRINet∗ Sun et al. (2019b) pc 1024 × 3 – – 87.0
ClusterNet Chen et al. (2019b) pc 1024 × 3 87.1 87.1 87.1
GCANet Zhang et al. (2020c) pc 1024 × 3 89.0 89.2 89.1
PR-invNet (Yu et al. 2020a) pc 1024 × 3 89.2 89.2 89.2
RI-GCN Kim et al. (2020b) pc 1024 × 3 89.5 89.5 89.5
RI-GCN Kim et al. (2020b) pc+n 1024 × 6 91.0 91.0 91.0
RI-Framework Li et al. (2021c) pc 1024 × 3 89.4 89.3 89.4
RTN+DGCNN Deng et al. (2021b) pc 1024 × 3 - 90.2 -
SGMNet Xu et al. (2021b) pc 1024 × 3 90.0 90.0 90.0
Li et al. (2021b) pc 1024 × 3 90.2 90.2 90.2
LGR-Net Zhao et al. (2022a) pc+n 1024 × 6 90.9 91.1 90.9
RIConv++ Zhang et al. (2022) pc 1024 × 3 91.2 91.2 91.2
PaRI-Conv (Chen and Cong 2022) pc 1024 × 3 91.4 91.4 91.4
PaRI-Conv Chen and Cong (2022) pc+n 1024 × 6 92.4 92.4 92.4
CRIN Lou et al. (2023) pc 1024 × 3 91.8 91.8 91.8
Rotation equivariant methods
Spherical CNNs Esteves et al. (2018a) mesh 2 × 642 88.9 86.9 78.6
TFN2 (Thomas et al. 2018) pc 1024 × 3 88.5 87.6 85.3
SCNN (Liu et al. 2019a)
a3 ∗ mesh 2 × 165 × 65 – 88.7 87.9
Esteves et al. (2019b) image 80 × 2242 – 91.1 -
SPHNet Poulenard et al. (2019) pc 1024 × 3 – 87.6 86.6
Li et al. (2019a) pc 1024 × 3 – 88.8 -
SFCNN Rao et al. (2019) pc 1024 × 3 91.4 90.1 84.8
SFCNN Rao et al. (2019) pc+n 1024 × 6 92.3 91.0 85.3
PRIN∗ You et al. (2020, 2021) pc 1024 × 3 – - 72.4
SPRIN∗ You et al. (2021) pc 1024 × 3 – – 86.1
VN-DGCNN Deng et al. (2021a) pc 1024 × 3 89.5 90.2 89.5
EPN Chen et al. (2021) pc 1024 × 3 – 88.3 –
Poulenard and Guibas (2021) pc 1024 × 3 89.7 89.7 89.7
VN-Transformer Assaad et al. (2022) pc 1024 × 3 – – 90.8

pc = point cloud, n = normal. ∗ : Some works replace azimuthal rotation augmentation (z) with no augmen-
tation (I).1: Results are from Esteves et al. (2018a). b: Results are from Deng et al. (2021a)

5 Application and dataset

Rotation invariance and equivariance are seldom separate problems and always depend
on task requirements in specific settings. We give a general overview of applications and
datasets involved in related works and divide them into 3D semantic understanding and

13
168 Page 30 of 52 J. Fei, Z. Deng

Table 11  ScanObjectNN (Uy et al. 2019) classification results of representative rotation invariant/equivari-
ant methods
Method Overall Accuracy (%)

OBJ_ONLY OBJ_BG PB_T50_RS


z/z SO(3) z
z/z SO(3) z
z/z SO(3) z
SO(3) SO(3) SO(3) SO(3) SO(3) SO(3)

Rotation invariant methods


RIConva, Zhang et al. (2019b) 79.8 79.8 79.8 78.4 78.2 78.4 68.1 68.3 68.3
GCANeta Zhang et al. (2020c) 80.1 80.3 80.1 78.2 78.1 78.2 69.8 70.0 69.8
RI-Framework Li et al. (2021c) – – – – – 79.8 – – –
Li et al. (2021b)∗ – – – 84.3 84.3 84.3 – - –
LGR-Net Zhao et al. (2022a) – – – 81.2 81.4 81.2 72.7 72.9 72.7
RIConv++ (Zhang et al. 2022) 86.2 86.2 86.2 85.6 85.6 85.6 80.3 80.3 80.3
PaRI-Conv Chen and Cong (2022) – – – 83.3 83.3 83.3 – – -
CRIN∗ Lou et al. (2023) – – – 84.7 84.7 84.7 – – –
Rotation equivariant methods
PRIN∗ You et al. (2020, 2021) – – – – – – – – 52.1
SPRIN You et al. (2021)
∗ – – – – – – – – 69.8

All methods take 2,048 points as input. ∗ : Some works replace azimuthal rotation augmentation (z) with no
augmentation (I).a: Results are from Zhang et al. (2022)

molecule-related applications.

5.1 3D semantic understanding

3D semantic understanding tasks, like classification, segmentation, and detection, evaluate


the capability of DNNs on 3D shapes and scenes. Here we focus on tasks requiring rotation
invariance and equivariance. We summarize these tasks and related datasets in Table 9. For
aligned datasets, rotation augmentation is required to pose enough challenges on rotation
invariant and equivariant methods. In the following discussions, A/B and AB refer to training
with A augmentation and evaluation with B augmentation. We use z and SO(3) to represent
azimuthal and random rotation augmentation, respectively.
Classification. Classification is the most well-studied task in this field. ModelNet
(Wu et al. 2015) is a commonly-used 3D CAD model dataset with two versions, i.e.,
ModelNet10 with 10 categories and ModelNet40 with all 40 ones. We list the experi-
mental results of ModelNet40 classification in Table 10. As the table shows, there is
an input type change from images and meshes to point clouds, which can be attributed
to the fact that point clouds can provide precise coordinates essential for strong rota-
tion invariance. Besides, rotation invariant methods generally perform better than rota-
tion equivariant ones. They are more suitable for tasks that only require prediction of
invariant targets. ShapeNetCore (Chang et al. 2015) is another popular 3D shape dataset
with 55 categories. Unlike previous datasets, ScanObjectNN (Uy et al. 2019) is a real-
world point cloud dataset, adding more challenges to classification. ScanObjectNN has
three popular variants, i.e., OBJ_ONLY, OBJ_BG, PB_T50_RS. Many researchers also
evaluate their methods on these datasets, whose experimental results are summarized

13
Rotation invariance and equivariance in 3D deep learning: a… Page 31 of 52 168

Table 12  ShapeNetPart (Yi et al. 2016) segmentation results of representative rotation invariant/equivariant
methods
Method Normal z/z SO(3)/SO(3) z/SO(3)
ins cls ins cls ins cls

Rotation invariant methods


RIConv (Zhang et al. 2019b) × – – 80.2 75.5 80.2 75.3
SRINet∗ (Sun et al. 2019b) × – – – – – 77.0
GCANet (Zhang et al. 2020c) × – – – 77.3 – 77.2
PR-invNet (Yu et al. 2020a) × – 79.4 – 79.4 – 79.4
RI-GCN (Kim et al. 2020b) × – – – 77.3 – 77.2
RI-Framework (Li et al. 2021c) × – – 82.3 79.4 82.0 79.2
RTN+DGCNN (Deng et al. 2021b) × – – 82.8 – – –
SGMNet (Xu et al. 2021b) × – 79.3 – 79.3 – 79.3
Li et al. (2021b) × – – 81.7 – 81.7 –
LGR–Net (Zhao et al. 2022a) ✓ – – 82.7 80.1 82.4 80.0
RIConv++ (Zhang et al. 2022) × – – – 80.3 – 80.3
RIConv++ (Zhang et al. 2022) ✓ – – – 80.5 – 80.5
PaRI-Conv (Chen and Cong 2022) × – – 83.8 – 83.8 –
PaRI-Conv (Chen and Cong 2022) ✓ – – 84.6 – 84.6 –
CRIN (Lou et al. 2023) × – 80.5 – 80.5 – 80.5
EIPs (Fei and Deng 2024) × – – – – 84.9 82.1
Rotation Equivariant Methods
PRIN∗ (You et al. 2020, 2021) × – – – – 71.2 66.8
SPRIN (You et al. 2021)
∗ × – – – – 82.7 79.5
VN–DGCNN (Deng et al. 2021a) × – – 81.4 – 81.4 –
Poulenard and Guibas (2021) × – – 81.7 78.4 81.8 78.0

All methods take 2048 points as input, while some employ normals as additional inputs. : Some works
∗a

replace azimuthal rotation augmentation (z) with no augmentation (I)

in Table 11. Fewer works explore ScanObjectNN compared to ModelNet40 (Wu et al.
2015). Besides, there is no consensus on which variant to evaluate and still has much
room for improvement. Other datasets are used less frequently, like RGB-D Object (Lai
et al. 2011), S3DIS (Armeni et al. 2016), and ScanNet (Dai et al. 2017). Some meth-
ods, especially those processing spherical signals, use Spherical MNIST (Cohen et al.
2018a) to evaluate their performances. Yang et al. (2020) create Spherical CIFAR-10 to
experiment on photorealistic images. Andrearczyk and Depeursinge (2018); Almakady
et al. (2020) exploit RFAI (Paulhac et al. 2009) on 3D texture classification. Yang
and Chakraborty (2020) employ the OASIS (Fotenos et al. 2005) for medical image
classification.
Segmentation. Segmentation is another popular task, aiming to make fine-grained
prediction. In part segmentation for small-scale objects, ShapeNetPart (Yi et al. 2016) is
widely applied as the evaluation dataset, where two common metrics, i.e., instance mean
IoU (ins.) and class mean IoU (cls.) are generally used. As shown in Table 12, RIConv++
(Zhang et al. 2022) and PaRI-Conv (Chen and Cong 2022) set the state-of-the-art results
in class mean IoU and instance mean IoU, respectively. However, we also notice that the

13
168 Page 32 of 52 J. Fei, Z. Deng

differences in evaluation metrics make direct comparisons of various methods confusing


and unfair, which should be avoided in future works. Performance gaps still exist between
rotation invariant methods and rotation equivariant ones, since part segmentation tasks
only require point-wise invariant prediction. Hegde and Gangisetty (2021) employ Part-
Net (Mo et al. 2019) for a more thorough evaluation. Besides, Zhuang et al. (2019); Zhu
et al. (2020) investigate BraTS-2018 (Menze et al. 2015) on brain tumor segmentation. In
semantic segmentation for large-scale scenes, S3DIS (Armeni et al. 2016), ScanNet (Dai
et al. 2017), Semantic3D (Hackel et al. 2017), and 2D-3D-S (Armeni et al. 2017) are com-
monly used.
Detection. Detection is a basic task but needs to be more exploited when considering
rotation invariance and equivariance. Some works (Yu et al. 2022; Wang et al. 2023b)
incorporate equivariant networks with 3D object detectors. These methods are applied on
indoor datasets like ScanNetV2 (Dai et al. 2017), SUN RGB-D (Song et al. 2015) and out-
door datasets like KITTI (Geiger et al. 2012), nuScenes (Caesar et al. 2020). Besides, Win-
kels and Cohen (2018); Andrearczyk et al. (2019, 2020) investigate the pulmonary nodule
detection task with LIDC/IDRI (McNitt-Gray et al. 2007) and NLST (Team 2011).
Pose Estimation. The targets for pose estimation are pose parameters. Many aligned
datasets can be adjusted for pose estimation, including ModelNet (Wu et al. 2015), Shap-
eNet (Chang et al. 2015), and ObjectNet3D (Xiang et al. 2016). Besides general shapes,
some works focus on the pose estimation of specific objects. Xu et al. (2021a) employ
Human3.6M (Ionescu et al. 2014) and MPI-INF-3DHP (Mehta et al. 2017) for human
pose estimation. Chen et al. (2018b) regress hand poses on ICVL (Tang et al. 2014), NYU
(Tompson et al. 2014), and MSRA (Sun et al. 2015).
Shape Registration. Registration is matching among multiple inputs. 3DMatch (Zeng
et al. 2017) is an well-known registration benchmark composed of 7Scenes (Shotton et al.
2013) and SUN3D (Xiao et al. 2013). Liu et al. (2018); Melzi et al. (2019) investigate reg-
istration on the Stanford 3D Scanning Repository (Curless and Levoy 1996). Melzi et al.
(2019) exploit TOSCA (Bronstein et al. 2008), FAUST (Bogo et al. 2014), and TOPKIDS
(Lähner et al. 2016) on deformable shape registration.
Place Recognition. Place recognition is a special case of registration through matching
with maps. KITTI (Geiger et al. 2012) includes a series of benchmarks of autonomous

Table 13  Tasks and datasets in Task Dataset


molecule-related applications
Prediction QM9 Ramakrishnan et al. (2014)
ATOM3D Townshend et al. (2021)
QM7 Blum and Reymond (2009; Rupp et al. 2012)
MD17 Chmiela et al. (2017)
ISO17 Schütt et al. (2017)
ESOL Delaney (2004)
BACE Subramanian et al. (2016)
PDB Berman et al. (2003)
OC20 Zitnick et al. (2020)
Generation QM9 Ramakrishnan et al. (2014)
CATH 4.2 Ingraham et al. (2019)
TS50 Li et al. (2014)
GEOM Axelrod and Gómez-Bombarelli (2022)

13
Table 14  QM9 (Ramakrishnan et al. 2014) prediction mean absolute error of representative rotation invariant/equivariant methods
Property 𝛼 Δ𝜖 𝜖HOMO 𝜖LUMO 𝜇 Cv G H R2 U U0 ZPVE
Unit bohr 3 meV meV meV D cal/mol K meV meV 2 meV meV meV
bohr

NMP (Gilmer et al. 2017) .092 69 43 38 .030 .040 19 17 .180 20 20 1.5


SchNet (Schütt et al. 2017) .235 63 41 34 .033 .033 14 14 .073 19 14 1.7
TFN (Thomas et al. 2018) .223 58 40 38 .064 .101 – – – – – –
Cormorant (Anderson et al. 2019) .085 61 34 38 .038 .026 20 21 .961 21 22 2.027
LieConv (Finzi et al. 2020) .084 49 30 25 .032 .038 22 24 .800 19 19 2.280
Rotation invariance and equivariance in 3D deep learning: a…

SE(3)-Transformer (Fuchs et al. 2020) .142 53.0 33.0 35.0 .051 .054 – – – – – –
PaiNN (Schütt et al. 2021) .045 45.7 27.6 20.4 .012 .024 7.35 5.98 .066 5.83 5.85 1.28
LieTransformer (Hutchinson et al. 2021) .082 51 33 27 .041 .035 19 17 .448 16 17 2.10
EGNN (Satorras et al. 2021b) .071 48 29 25 .029 .031 12 12 .106 12 11 1.55
SEGNN (Brandstetter et al. 2022) .060 42 24 21 .023 .031 15 16 .660 13 15 1.62
TorchMD-NET (Thölke and Fabritiis 2022) .059 36.1 20.3 17.5 .011 .026 7.62 6.16 .033 6.38 6.15 1.84
Esteves et al. (2023) .049 28.8 21.6 18.0 .016 .022 6.54 5.69 .027 5.72 5.65 1.15
Page 33 of 52 168

13
168 Page 34 of 52 J. Fei, Z. Deng

driving, where the odometry benchmark is generally adopted to evaluate the place rec-
ognition performance. Many datasets are also leveraged for a comprehensive evaluation,
including ETH (Pomerleau et al. 2012), NCLT (Carlevaris-Bianco et al. 2016), SceneCity
(Zhang et al. 2016), Oxford RobotCar (Maddern et al. 2017), MulRan (Kim et al. 2020a),
KITTI-360 (Liao et al. 2022).
Reconstruction. Reconstruction is a pre-training task adopted by many self-supervised
methods. Much work (Shen et al. 2020; Deng et al. 2021a; Sun et al. 2021; Zhou et al.
2022b) carries out the reconstruction experiment on ShapeNetCore (Chang et al. 2015). In
addition, Yu et al. (2020b) utilize ModelNet40 (Wu et al. 2015) for point cloud inpainting
and completion.
Retrieval. Retrieval is the task of finding similar objects to the query object. SHREC’ 17
(Savva et al. 2017) is a famous retrieval challenge based on the ShapeNetCore (Chang et al.
2015). Some methods (Su et al. 2015; Esteves et al. 2019b; Wei et al. 2020) also experi-
ment on ModelNet (Wu et al. 2015).
Others. Ke et al. (2017) use the NTU RGB+D (Shahroudy et al. 2016), the SBU kinect
interaction (Yun et al. 2012), and the CMU dataset (CMU 2002) for skeleton action rec-
ognition. Qin et al. (2022) apply FPHA (Garcia-Hernando et al. 2018) on hand action rec-
ognition. Besides, some methods (Liu et al. 2019b; Zhang et al. 2020c; Yang et al. 2021)
exploit ModelNet40 (Wu et al. 2015) on normal estimation. Esteves et al. (2023) employ
the WeatherBench (Rasp et al. 2020) to evaluate large spherical CNNs (Esteves et al.
2018a) on weather forecasting.

5.2 Molecule‑related application

Recently, the number of papers that employ rotation equivariant networks on molecular
data grows explosively. The physical and chemical laws determine the relative but not
absolute positions of atoms. Therefore, rotation invariance and equivariance are inherently
needed in molecule-related applications. As related work goes further, many new tasks
are investigated, and we only summarize some representative ones. Tasks and datasets are
listed in Table 13.
Prediction Prediction is to predict molecular properties giving molecular structures.
QM7 (Blum and Reymond 2009; Rupp et al. 2012) is a small and pioneering dataset used
by some works (Liu et al. 2022c; Kondor et al. 2018). QM9 (Ramakrishnan et al. 2014) is
a commonly-used dataset, including 134k molecules with geometric, energetic, electronic,
and thermodynamic properties. As shown in Tab. 14, there are more rotation equivari-
ant methods than rotation invariant ones in this prediction task. As related research dives
further, novel methods with powerful and sophisticated structures show great potential in
decreasing the mean absolute error of molecular property prediction. ATOM3D (Town-
shend et al. 2021) is a set of benchmarks including various tasks. Other datasets, including
MD17 (Chmiela et al. 2017), ISO17 (Schütt et al. 2017), ESOL (Delaney 2004), BACE
(Subramanian et al. 2016), PDB (Berman et al. 2003), and OC20 (Zitnick et al. 2020), are
also applied in different prediction tasks.
Generation In generation, the model is required to generate molecules according to
certain requirements. Thomas et al. (2018) employ random deletion on QM9 (Ram-
akrishnan et al. 2014) and validate the model with an inpainting task. Jing et al. (2021b);
Li et al. (2022a) exploit CATH 4.2 (Ingraham et al. 2019) and TS50 (Li et al. 2014) on
computational protein design. Du et al. (2021) employ subsets of GEOM (Axelrod and

13
Rotation invariance and equivariance in 3D deep learning: a… Page 35 of 52 168

Gómez-Bombarelli 2022) on conformation generation tasks. Satorras et al. (2021a) utilize


LJ-13 (Köhler et al. 2020) on 3D states generation.
Others Jing et al. (2021b); Li et al. (2022a) apply CASP (Cheng et al. 2019) on model
quality assessment. Poulenard et al. (2019) leverage PDB (Berman et al. 2000) on RNA
segmentation. Ganea et al. (2022) exploit DB5.5 (Vreven et al. 2015) and DIPS (Town-
shend et al. 2019) on rigid protein docking.

6 Future direction

Here we point out several future research directions inspired from unsolved problems in the
presence of methods and tasks.

6.1 Method

The pros and cons of existing methods have been summarized in Sects. 3 and 4. There-
fore, the future method should perform better and avoid any previous drawbacks by pos-
sessing the following properties.

• Strong rotation invariance and equivariance. This survey includes weakly invariant
and equivariant methods in discussing rotation invariance and equivariance for the
first time. Nonetheless, we argue only to use these methods if necessary. They involve
redundant uncertainties and cannot deliver consistent results for the same inputs with
different poses.
• Concise mathematical background. The theory of many existing methods is too verbose
and complicated. It should be simplified, especially when they have little connection
with the implementation. Any novel method should avoid exploring general but unre-
lated theories.
• High computational efficiency. Due to the high latency, many well-performed methods
cannot be employed in practical applications. As the research progresses to large-scale
and complex data, the latest work should consider such application scenarios and be as
efficient as possible.
• Reliable integrability. Many successful DNNs have been developed for numerous appli-
cations where rotation invariance and equivariance are not considered. Therefore, they
are only suitable for aligned data. If lately-developed methods can be integrated with
these models straightforwardly, then the composite models would benefit from both.

6.2 Theoretical analysis

Most of the existing theoretical analysis addresses strong invariance and equivariance.
Some methods propose mathematical frameworks to construct equivariant networks (Kon-
dor and Trivedi 2018; Cohen et al. 2018b, 2019b; Esteves 2020; Aronsson 2021; Gerken
et al. 2021; Winter et al. 2022). However, the discussion on universal approximation is
quite limited (Dym and Maron 2021), and most equivariant networks do not have solid
mathematical foundations.

13
168 Page 36 of 52 J. Fei, Z. Deng

6.3 Benchmark

The research on rotation invariance and equivariance is still immature and lacks reliable
and comprehensive benchmarks. Except for some well-studied tasks, most applications
have yet to be intensively investigated. The evaluation metric (Eq. 3) has yet to be com-
monly adopted, especially for weakly rotation invariant and equivariant methods. Existing
metrics cannot reflect the strength of invariance and equivariance.

7 Conclusion

In this survey, we give a comprehensive overview of rotation invariant and equivariant


methods in 3D deep learning. We first discuss the limitation of DNNs trained with canoni-
cal poses, which motivates the research of rotation invariant and equivariant methods.
Then, we define weak/strong invariance and equivariance and provide a unified theoretical
framework for analysis. Totally, all the existing methods are divided into rotation invari-
ant ones and equivariant ones, which continue to be subclassified in terms of their princi-
ples. At this level, representative literatures are reviewed and discussed, and both applica-
tions and datasets are sorted out. Finally, we pose some open problems and deliver future
research directions based on challenges and difficulties in the current research. We hope
this survey can serve as an effective tool for future research on rotation invariant and equiv-
ariant methods.

Author contributions Jiajun Fei wrote the main manuscript text. Zhidong Deng served as the scientific advi-
sor and led this research project. Deng chose this topic and provided valuable suggestions throughout the
whole research process. After the initial draft was finished, Deng made many thorough and comprehensive
revisions to correct the mistakes and improve the readability. Both authors reviewed the manuscript.

Funding This work was supported in part by the National Science Foundation of China (NSFC) under
Grant No. 62176134. The authors have no relevant financial or non-financial interests to disclose.

Declarations
Conflict of interest The authors declare no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-
mons licence, and indicate if changes were made. The images or other third party material in this article
are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

References
Almakady Y, Mahmoodi S, Conway J et al (2020) Rotation invariant features based on three dimensional
Gaussian Markov random fields for volumetric texture classification. Comput Vis Image Underst
194(102):931. https://​doi.​org/​10.​1016/j.​cviu.​2020.​102931

13
Rotation invariance and equivariance in 3D deep learning: a… Page 37 of 52 168

Anderson B, Hy TS, Kondor R (2019) Cormorant: covariant molecular neural networks. In: Advances in
neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Andrearczyk V, Depeursinge A (2018) Rotational 3d texture classification using group equivariant cnns.
arXiv preprint arXiv:​1810.​06889
Andrearczyk V, Fageot J, Oreiller V, et al (2019) Exploring local rotation invariance in 3d cnns with steer-
able filters. In: Proceedings of The 2nd international conference on medical imaging with deep learn-
ing, proceedings of machine learning research, vol 102. PMLR, pp 15–26
Andrearczyk V, Fageot J, Oreiller V et al (2020) Local rotation invariance in 3d cnns. Med Image Anal
65(101):756. https://​doi.​org/​10.​1016/j.​media.​2020.​101756
Ao S, Hu Q, Yang B, et al (2021) Spinnet: learning a general surface descriptor for 3d point cloud registra-
tion. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,748–
11,757, https://​doi.​org/​10.​1109/​CVPR4​6437.​2021.​01158
Ao S, Guo Y, Hu Q et al (2023) You only train once: learning general and distinctive 3d local descriptors.
IEEE Trans Pattern Anal Mach Intell 45(3):3949–3967. https://​doi.​org/​10.​1109/​TPAMI.​2022.​31803​
41
Ao S, Hu Q, Wang H, et al (2023b) Buffer: balancing accuracy, efficiency, and generalizability in point
cloud registration. In: 2023 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 1255–1264, https://​doi.​org/​10.​1109/​CVPR5​2729.​2023.​00127
Armeni I, Sener O, Zamir AR, et al (2016) 3d semantic parsing of large-scale indoor spaces. In: 2016 IEEE
conference on computer vision and pattern recognition (CVPR), pp 1534–1543, https://​doi.​org/​10.​
1109/​CVPR.​2016.​170
Armeni I, Sax S, Zamir AR, et al (2017) Joint 2d-3d-semantic data for indoor scene understanding. https://​
doi.​org/​10.​48550/​ARXIV.​1702.​01105
Aronsson J (2021) Homogeneous vector bundles and g-equivariant convolutional neural networks. PhD the-
sis, Chalmers Tekniska Hogskola
Artin M (2013) Algebra. Pearson Education, London
Assaad S, Downey C, Al-Rfou’ R, et al (2022) VN-transformer: rotation-equivariant attention for vector
neurons. arXiv:​2206.​04176
Axelrod S, Gómez-Bombarelli R (2022) Geom, energy-annotated molecular conformations for property
prediction and molecular generation. Sci Data 9(1):185. https://​doi.​org/​10.​1038/​s41597-​022-​01288-4
Azari B, Erdogmus D (2022) Equivariant deep dynamical model for motion prediction. In: Proceedings
of The 25th international conference on artificial intelligence and statistics, proceedings of machine
learning research, vol 151. PMLR, pp 11,655–11,668
Bai X, Luo Z, Zhou L, et al (2020) D3feat: joint learning of dense detection and description of 3d local
features. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp
6358–6366, https://​doi.​org/​10.​1109/​CVPR4​2600.​2020.​00639
Batzner S, Musaelian A, Sun L et al (2022) E(3)-equivariant graph neural networks for data-effi-
cient and accurate interatomic potentials. Nat Commun 13(1):2453. https://​doi.​org/​10.​1038/​
s41467-​022-​29939-5
Bergmann P, Sattlegger D (2023) Anomaly detection in 3d point clouds using deep geometric descriptors.
In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp 2612–2622,
https://​doi.​org/​10.​1109/​WACV5​6688.​2023.​00264
Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide protein data bank. Nat Struct Mol
Biol 10(12):980. https://​doi.​org/​10.​1038/​nsb12​03-​980
Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28(1):235–242.
https://​doi.​org/​10.​1093/​nar/​28.1.​235
Blum LC, Reymond JL (2009) 970 million druglike small molecules for virtual screening in the chemical
universe database GDB-13. J Am Chem Soc 131(25):8732–8733. https://​doi.​org/​10.​1021/​ja902​302h
Bobkov D, Chen S, Jian R et al (2018) Noise-resistant deep learning for object classification in three-dimen-
sional point clouds using a point pair descriptor. IEEE Robotics and Automation Letters 3(2):865–
872. https://​doi.​org/​10.​1109/​LRA.​2018.​27926​81
Bogo F, Romero J, Loper M, et al (2014) Faust: dataset and evaluation for 3d mesh registration. In: 2014
IEEE conference on computer vision and pattern recognition, pp 3794–3801, https://​doi.​org/​10.​1109/​
CVPR.​2014.​491
Brandstetter J, Hesselink R, van der Pol E, et al (2022) Geometric and physical quantities improve e(3)
equivariant message passing. In: International conference on learning representations (ICLR)
Bronstein AM, Bronstein MM, Kimmel R (2008) Numerical geometry of non-rigid shapes. Springer, Berlin
Caesar H, Bankiti V, Lang AH, et al (2020) nuscenes: a multimodal dataset for autonomous driving. In:
2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11,618–11,628,
https://​doi.​org/​10.​1109/​CVPR4​2600.​2020.​01164

13
168 Page 38 of 52 J. Fei, Z. Deng

Cao H, Zhan R, Ma Y et al (2021) Lfnet: local rotation invariant coordinate frame for robust point cloud
analysis. IEEE Signal Process Lett 28:209–213. https://​doi.​org/​10.​1109/​LSP.​2020.​30486​05
Cao Z, Huang Q, Karthik R (2017) 3d object classification via spherical projections. In: 2017 international
conference on 3D vision (3DV), pp 566–574, https://​doi.​org/​10.​1109/​3DV.​2017.​00070
Carlevaris-Bianco N, Ushani AK, Eustice RM (2016) University of Michigan North campus long-term
vision and lidar dataset. Int J Robot Res 35(9):1023–1035. https://​doi.​org/​10.​1177/​02783​64915​
614638
Chang AX, Funkhouser T, Guibas L, et al (2015) Shapenet: an information-rich 3d model repository. arXiv
preprint arXiv:​1512.​03012
Chatzipantazis E, Pertigkiozoglou S, Dobriban E, et al (2023) SE(3)-equivariant attention networks for
shape reconstruction in function space. In: International conference on learning representations
(ICLR)
Chen C, Li C, Chen L, et al (2018a) Continuous-time flows for efficient inference and density estimation.
In: Proceedings of the 35th International conference on machine learning (ICML), proceedings of
machine learning research, vol 80. PMLR, pp 824–833
Chen C, Fragonara LZ, Tsourdos A (2019a) Gapnet: graph attention based point neural network for exploit-
ing local feature of point cloud. https://​doi.​org/​10.​48550/​ARXIV.​1905.​08705
Chen C, Li G, Xu R, et al (2019b) Clusternet: deep hierarchical cluster network with rigorously rotation-
invariant representation for point cloud analysis. In: 2019 IEEE/CVF conference on computer vision
and pattern recognition (CVPR), pp 4989–4997, https://​doi.​org/​10.​1109/​CVPR.​2019.​00513
Chen H, Liu S, Chen W, et al (2021) Equivariant point network for 3d point cloud analysis. In: 2021 IEEE/
CVF conference on computer vision and pattern recognition (CVPR), pp 14,509–14,518, https://​doi.​
org/​10.​1109/​CVPR4​6437.​2021.​01428
Chen H, Zhao J, Zhang Q (2023) Rotation-equivariant spherical vector networks for objects recognition
with unknown poses. Vis Comput. https://​doi.​org/​10.​1007/​s00371-​023-​02904-z
Chen Q, Chen Y (2022) Multi-view 3d model retrieval based on enhanced detail features with contrastive
center loss. Multimed Tools Appl. https://​doi.​org/​10.​1007/​s11042-​022-​12281-9
Chen R, Cong Y (2022) The devil is in the pose: ambiguity-free 3d rotation-invariant learning via pose-
aware convolution. In: 2022 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 7462–7471, https://​doi.​org/​10.​1109/​CVPR5​2688.​2022.​00732
Chen X, Wang G, Zhang C et al (2018) Shpr-net: deep semantic hand pose regression from point clouds.
IEEE Access 6:43425–43439. https://​doi.​org/​10.​1109/​ACCESS.​2018.​28635​40
Chen Y, Fernando B, Bilen H et al (2022) 3d equivariant graph implicit functions. Computer Vision–ECCV
2022. Springer, Cham, pp 485–502. https://​doi.​org/​10.​1007/​978-3-​031-​20062-5_​28
Cheng J, Choe MH, Elofsson A et al (2019) Estimation of model accuracy in casp13. Proteins Struct Funct
Bioinf 87(12):1361–1377. https://​doi.​org/​10.​1002/​prot.​25767
Chmiela S, Tkatchenko A, Sauceda HE et al (2017) Machine learning of accurate energy-conserving molec-
ular force fields. Sci Adv 3(5):e1603,015. https://​doi.​org/​10.​1126/​sciadv.​16030​15
Chou YC, Lin YP, Yeh YM, et al (2021) 3d-gfe: a three-dimensional geometric-feature extractor for point
cloud data. In: 2021 Asia-Pacific Signal and information processing association annual summit and
conference (APSIPA ASC), pp 2013–2017
Choy C, Park J, Koltun V (2019) Fully convolutional geometric features. In: 2019 IEEE/CVF international
conference on computer vision (ICCV), pp 8957–8965, https://​doi.​org/​10.​1109/​ICCV.​2019.​00905
CMU (2002) Cmu graphics lab motion capture database. http://​mocap.​cs.​cmu.​edu/
Cohen T, Welling M (2016) Group equivariant convolutional networks. In: Balcan MF, Weinberger KQ
(eds) Proceedings of The 33rd international conference on machine learning (ICML), proceedings
of machine learning research, vol 48. PMLR, New York, New York, USA, pp 2990–2999
Cohen T, Weiler M, Kicanaoglu B, et al (2019a) Gauge equivariant convolutional networks and the ico-
sahedral CNN. In: Proceedings of the 36th international conference on machine learning (ICML),
proceedings of machine learning research, vol 97. PMLR, pp 1321–1330
Cohen TS, Geiger M, Köhler J, et al (2018a) Spherical CNNS. In: International conference on learning
representations (ICLR)
Cohen TS, Geiger M, Weiler M (2018b) Intertwiners between induced representations (with applications
to the theory of equivariant neural networks). https://​doi.​org/​10.​48550/​ARXIV.​1803.​10743
Cohen TS, Geiger M, Weiler M (2019b) A general theory of equivariant cnns on homogeneous spaces.
In: Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Cornwell JF (1997) Group theory in physics: an introduction. Academic press, San Diego
Curless B, Levoy M (1996) A volumetric method for building complex models from range images.
In: Proceedings of the 23rd annual conference on computer graphics and interactive techniques.

13
Rotation invariance and equivariance in 3D deep learning: a… Page 39 of 52 168

Association for computing machinery, New York, NY, USA, SIGGRAPH ’96, pp 303–312, https://​
doi.​org/​10.​1145/​237170.​237269
Dai A, Chang AX, Savva M, et al (2017) Scannet: Richly-annotated 3d reconstructions of indoor scenes.
In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2432–2443,
https://​doi.​org/​10.​1109/​CVPR.​2017.​261
Delaney JS (2004) Esol: estimating aqueous solubility directly from molecular structure. J Chem Inf
Comput Sci 44(3):1000–1005. https://​doi.​org/​10.​1021/​ci034​243x
Deng C, Litany O, Duan Y, et al (2021a) Vector neurons: a general framework for so(3)-equivariant
networks. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 12,180–
12,189, https://​doi.​org/​10.​1109/​ICCV4​8922.​2021.​01198
Deng H, Birdal T, Ilic S (2018) Ppf-foldnet: unsupervised learning of rotation invariant 3d local descrip-
tors. Computer VisionECCV 2018. Springer, Cham, pp 620–638. https://​doi.​org/​10.​1007/​978-3-​
030-​01228-1_​37
Deng H, Birdal T, Ilic S (2018b) Ppfnet: global context aware local features for robust 3d point match-
ing. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 195–205,
https://​doi.​org/​10.​1109/​CVPR.​2018.​00028
Deng S, Liu B, Dong Q, et al (2021b) Rotation transformation network: learning view-invariant point
cloud for classification and segmentation. In: 2021 IEEE international conference on multimedia
and expo (ICME), pp 1–6, https://​doi.​org/​10.​1109/​ICME5​1207.​2021.​94282​65
Drost B, Ulrich M, Navab N, et al (2010) Model globally, match locally: efficient and robust 3d object
recognition. In: 2010 IEEE computer society conference on computer vision and pattern recogni-
tion (CVPR), pp 998–1005, https://​doi.​org/​10.​1109/​CVPR.​2010.​55401​08
Du W, Zhang H, Du Y, et al (2021) Equivariant vector field network for many-body system modeling.
https://​doi.​org/​10.​48550/​ARXIV.​2110.​14811
Dym N, Maron H (2021) On the universality of rotation equivariant point cloud networks. In: Interna-
tional conference on learning representations (ICLR)
Esteves C (2020) Theoretical aspects of group equivariant neural networks. arXiv preprint arXiv:​2004.​
05154
Esteves C, Allen-Blanchette C, Makadia A et al (2018) Learning so(3) equivariant representations with
spherical CNNS. Computer Vision–ECCV 2018. Springer International Publishing, Cham, pp
54–70. https://​doi.​org/​10.​1007/​978-3-​030-​01261-8_4
Esteves C, Allen-Blanchette C, Zhou X, et al (2018b) Polar transformer networks. In: International con-
ference on learning representations (ICLR)
Esteves C, Sud A, Luo Z, et al (2019a) Cross-domain 3d equivariant image embeddings. In: Proceedings
of the 36th international conference on machine learning (ICML), Proceedings of machine learn-
ing research, vol 97. PMLR, pp 1812–1822
Esteves C, Xu Y, Allec-Blanchette C, et al (2019b) Equivariant multi-view networks. In: 2019 IEEE/
CVF international conference on computer vision (ICCV), pp 1568–1577, https://​doi.​org/​10.​1109/​
ICCV.​2019.​00165
Esteves C, Allen-Blanchette C, Makadia A et al (2020) Learning so(3) equivariant representations with
spherical CNNS. Int J Comput Vis 128:588–600. https://​doi.​org/​10.​1007/​s11263-​019-​01220-1
Esteves C, Makadia A, Daniilidis K (2020b) Spin-weighted spherical CNNS. In: Advances in neural
information processing systems (NeurIPS), vol 33. Curran Associates, Inc., pp 8614–8625
Esteves C, Slotine JJ, Makadia A (2023) Scaling spherical CNNs. In: Krause A, Brunskill E, Cho K,
et al (eds) Proceedings of the 40th international conference on machine learning (ICML), Proceed-
ings of machine learning research, vol 202. PMLR, pp 9396–9411
Fan S, Dong Q, Zhu F, et al (2021) Scf-net: learning spatial contextual features for large-scale point
cloud segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 14,499–14,508, https://​doi.​org/​10.​1109/​CVPR4​6437.​2021.​01427
Fan Y, He Y, Tan UX (2020) Seed: a segmentation-based egocentric 3d point cloud descriptor for loop
closure detection. In: 2020 IEEE/RSJ international conference on intelligent robots and systems
(IROS), pp 5158–5163, https://​doi.​org/​10.​1109/​IROS4​5743.​2020.​93415​17
Fan Z, Song Z, Zhang W et al (2023) Rpr-net: a point cloud-based rotation-aware large scale place rec-
ognition network. Computer Vision–ECCV 2022 Workshops. Springer Nature Switzerland, Cham,
pp 709–725. https://​doi.​org/​10.​1007/​978-3-​031-​25056-9_​45
Fang J, Zhou D, Song X, et al (2020) Rotpredictor: unsupervised canonical viewpoint learning for point
cloud classification. In: 2020 international conference on 3D vision (3DV), pp 987–996, https://​
doi.​org/​10.​1109/​3DV50​981.​2020.​00109
Fei J, Deng Z (2024) Incorporating rotation invariance with non-invariant networks for point clouds. In:
2024 international conference on 3D vision (3DV)

13
168 Page 40 of 52 J. Fei, Z. Deng

Fei J, Zhu Z, Liu W et al (2022) Dumlp-pin: a dual-mlp-dot-product permutation-invariant network


for set feature extraction. Proceedings of the AAAI conference on artificial intelligence (AAAI)
36(1):598–606. https://​doi.​org/​10.​1609/​aaai.​v36i1.​19939
Fent F, Bauerschmidt P, Lienkamp M (2023) Radargnn: transformation invariant graph neural network
for radar-based perception. In: 2023 IEEE/CVF conference on computer vision and pattern recog-
nition workshops (CVPRW), pp 182–191, https://​doi.​org/​10.​1109/​CVPRW​59228.​2023.​00023
Finkelshtein B, Baskin C, Maron H, et al (2022) A simple and universal rotation equivariant point-cloud
network. In: Proceedings of topological, algebraic, and geometric learning workshops 2022, Pro-
ceedings of machine learning research, vol 196. PMLR, pp 107–115
Finzi M, Stanton S, Izmailov P, et al (2020) Generalizing convolutional neural networks for equivari-
ance to lie groups on arbitrary continuous data. In: Proceedings of the 37th international confer-
ence on machine learning (ICML), proceedings of machine learning research, vol 119. PMLR, pp
3165–3176
Finzi M, Welling M, Wilson AGG (2021) A practical method for constructing equivariant multilayer per-
ceptrons for arbitrary matrix groups. In: Proceedings of the 38th international conference on machine
learning (ICML), Proceedings of machine learning research, vol 139. PMLR, pp 3318–3328
Fotenos AF, Snyder AZ, Girton LE et al (2005) Normative estimates of cross-sectional and longitudi-
nal brain volume decline in aging and ad. Neurology 64(6):1032–1039. https://​doi.​org/​10.​1212/​01.​
WNL.​00001​54530.​72969.​11
Fox J, Zhao B, del Rio BG, et al (2022) Concentric spherical neural network for 3d representation learn-
ing. In: 2022 international joint conference on neural networks (IJCNN), pp 1–8, https://​doi.​org/​
10.​1109/​IJCNN​55064.​2022.​98923​58
Fu R, Yang J, Sun J, et al (2020) Risa-net: rotation-invariant structure-aware network for fine-grained 3d
shape retrieval. https://​doi.​org/​10.​48550/​ARXIV.​2010.​00973
Fuchs F, Worrall D, Fischer V, et al (2020) Se(3)-transformers: 3d roto-translation equivariant attention
networks. In: Advances in neural information processing systems (NeurIPS), vol 33. Curran Asso-
ciates, Inc., pp 1970–1981
Fuchs FB, Wagstaff E, Dauparas J et al (2021) Iterative se(3)-transformers. Geometric science of infor-
mation. Springer International Publishing, Cham, pp 585–595. https://​doi.​org/​10.​1007/​978-3-​030-​
80209-7_​63
Furuya T, Ohbuchi R (2016) Deep aggregation of local 3d geometric features for 3d model retrieval. In:
Richard C. Wilson ERH, Smith WAP (eds) Proceedings of the British machine vision conference
(BMVC). BMVA Press, pp 121.1–121.12, https://​doi.​org/​10.​5244/C.​30.​121
Furuya T, Hang X, Ohbuchi R et al (2020) Convolution on rotation-invariant and multi-scale feature
graph for 3d point set segmentation. IEEE Access 8:140250–140260. https://​doi.​org/​10.​1109/​
ACCESS.​2020.​30126​13
Gandikota KV, Geiping J, Lähner Z, et al (2021) Training or architecture? how to incorporate invariance
in neural networks. arXiv preprint arXiv:​2106.​10044
Ganea OE, Huang X, Bunne C, et al (2022) Independent SE(3)-equivariant models for end-to-end rigid
protein docking. In: International conference on learning representations (ICLR)
Garcia-Hernando G, Yuan S, Baek S, et al (2018) First-person hand action benchmark with RGB-d vid-
eos and 3d hand pose annotations. In: 2018 IEEE/CVF conference on computer vision and pattern
recognition, pp 409–419, https://​doi.​org/​10.​1109/​CVPR.​2018.​00050
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark
suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp
3354–3361, https://​doi.​org/​10.​1109/​CVPR.​2012.​62480​74
Gerken JE, Aronsson J, Carlsson O et al (2021) Geometric deep learning and equivariant neural networks.
Artif Intell Rev 56(12):14605–14662
Gilmer J, Schoenholz SS, Riley PF, et al (2017) Neural message passing for quantum chemistry. In: Precup
D, Teh YW (eds) Proceedings of the 34th International conference on machine learning (ICML), Pro-
ceedings of machine learning research, vol 70. PMLR, pp 1263–1272
Gojcic Z, Zhou C, Wegner JD, et al (2019) The perfect match: 3d point cloud matching with smoothed
densities. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp
5540–5549, https://​doi.​org/​10.​1109/​CVPR.​2019.​00569
Gu R, Wu Q, Ng WW et al (2021) Erinet: enhanced rotation-invariant network for point cloud classification.
Pattern Recogn Lett 151:180–186. https://​doi.​org/​10.​1016/j.​patrec.​2021.​08.​010
Gu R, Wu Q, Xu H, et al (2021b) Learning efficient rotation representation for point cloud via local-global
aggregation. In: 2021 IEEE International conference on multimedia and expo (ICME), pp 1–6, https://​
doi.​org/​10.​1109/​ICME5​1207.​2021.​94281​70

13
Rotation invariance and equivariance in 3D deep learning: a… Page 41 of 52 168

Gu R, Wu Q, Li Y et al (2022) Enhanced local and global learning for rotation-invariant point cloud repre-
sentation. IEEE Multimed. https://​doi.​org/​10.​1109/​MMUL.​2022.​31519​06
Guan J, Qian WW, Peng X, et al (2023) 3d equivariant diffusion for target-aware molecule generation and
affinity prediction. In: International conference on learning representations (ICLR)
Guerrero P, Kleiman Y, Ovsjanikov M et al (2018) Pcpnet learning local shape properties from raw point
clouds. Comput Graph Forum 37(2):75–85. https://​doi.​org/​10.​1111/​cgf.​13343
Guo Y, Sohel F, Bennamoun M et al (2013) Rotational projection statistics for 3d local surface description
and object recognition. Int J Comput Vis 105(1):63–86. https://​doi.​org/​10.​1007/​s11263-​013-​0627-y
Haan PD, Weiler M, Cohen T, et al (2021) Gauge equivariant mesh cnns: anisotropic convolutions on geo-
metric graphs. In: International conference on learning representations (ICLR)
Hackel T, Savinov N, Ladicky L, et al (2017) Semantic3d.net: a new large-scale point cloud classification
benchmark. In: ISPRS annals of the photogrammetry, remote sensing and spatial information sci-
ences, pp 91–98
Han J, Rong Y, Xu T, et al (2022) Geometrically equivariant graph neural networks: a survey. arXiv preprint
arXiv:​2202.​07230
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference
on computer vision and pattern recognition (CVPR), pp 770–778, https://​doi.​org/​10.​1109/​CVPR.​
2016.​90
He L, Dong Y, Wang Y, et al (2021) Gauge equivariant transformer. In: Ranzato M, Beygelzimer A, Dau-
phin Y, et al (eds) Advances in neural information processing systems (NeurIPS), vol 34. Curran
Associates, Inc., pp 27,331–27,343
Hegde S, Gangisetty S (2021) Pig-net: inception based deep learning architecture for 3d point cloud seg-
mentation. Comput Graph 95:13–22. https://​doi.​org/​10.​1016/j.​cag.​2021.​01.​004
Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. Artificial neural net-
works and machine learning (ICANN). Springer, Heidelberg, pp 44–51. https://​doi.​org/​10.​1007/​
978-3-​642-​21735-7_6
Hoogeboom E, Satorras VG, Vignac C, et al (2022) Equivariant diffusion for molecule generation in 3D.
In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of
machine learning research, vol 162. PMLR, pp 8867–8887
Horie M, Morita N, Hishinuma T, et al (2021) Isometric transformation invariant and equivariant graph con-
volutional networks. In: International conference on learning representations (ICLR)
Horwitz E, Hoshen Y (2023) Back to the feature: classical 3d features are (almost) all you need for 3d
anomaly detection. In: 2023 IEEE/CVF conference on computer vision and pattern recognition work-
shops (CVPRW), pp 2968–2977, https://​doi.​org/​10.​1109/​CVPRW​59228.​2023.​00298
Huang W, Han J, Rong Y, et al (2022a) Equivariant graph mechanics networks with constraints. In: Interna-
tional conference on learning representations (ICLR)
Huang Y, Peng X, Ma J, et al (2022b) 3DLinker: an e(3) equivariant variational autoencoder for molecular
linker design. In: Proceedings of the 39th international conference on machine learning (ICML), pro-
ceedings of machine learning research, vol 162. PMLR, pp 9280–9294
Hutchinson MJ, Lan CL, Zaidi S, et al (2021) Lietransformer: equivariant self-attention for lie groups.
In: Proceedings of the 38th international conference on machine learning (ICML), proceedings of
machine learning research, vol 139. PMLR, pp 4533–4543
Igashov I, Stärk H, Vignac C, et al (2022) Equivariant 3d-conditional diffusion models for molecular linker
design. arXiv preprint arXiv:​2210.​05274
Ingraham J, Garg V, Barzilay R, et al (2019) Generative models for graph-based protein design. In:
Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates, Inc
Ionescu C, Papava D, Olaru V et al (2014) Human3.6m: large scale datasets and predictive methods for
3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339.
https://​doi.​org/​10.​1109/​TPAMI.​2013.​248
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. In: Cortes C, Lawrence
N, Lee D et al (eds) Advances in neural information processing systems (NIPS), vol 28. Curran Asso-
ciates Inc
Jing B, Eismann S, Soni PN, et al (2021a) Equivariant graph neural networks for 3d macromolecular struc-
ture. arXiv preprint arXiv:​2106.​03843
Jing B, Eismann S, Suriana P, et al (2021b) Learning from protein structure with geometric vector percep-
trons. In: International conference on learning representations (ICLR)
Jing B, Prabhu V, Gu A et al (2021) Rotation-invariant gait identification with quaternion convolutional neu-
ral networks (student abstract). Proc AAAI Conf Artif Intell (AAAI) 35(18):15805–15806. https://​
doi.​org/​10.​1609/​aaai.​v35i18.​17899

13
168 Page 42 of 52 J. Fei, Z. Deng

Joseph-Rivlin M, Zvirin A, Kimmel R (2019) Momen(e)t: f lavor the moments in learning to classify
shapes. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp
4085–4094, https://​doi.​org/​10.​1109/​ICCVW.​2019.​00503
Jørgensen PB, Bhowmik A (2022) Equivariant graph neural networks for fast electron density estimation of
molecules, liquids, and solids. NPJ Comput Mater 8:183. https://​doi.​org/​10.​1038/​s41524-​022-​00863-y
Kaba SO, Mondal AK, Zhang Y, et al (2023) Equivariance with learned canonicalization functions. In:
Krause A, Brunskill E, Cho K, et al (eds) Proceedings of the 40th international conference on
machine learning, proceedings of machine learning research, vol 202. PMLR, pp 15,546–15,566
Kadam P, Zhang M, Liu S et al (2022) R-pointhop: a green, accurate, and unsupervised point cloud regis-
tration method. IEEE Trans Image Process 31:2710–2725. https://​doi.​org/​10.​1109/​TIP.​2022.​31606​09
Kadam P, Prajapati H, Zhang M, et al (2023) S3i-pointhop: So(3)-invariant pointhop for 3d point cloud
classification. In: ICASSP 2023 - 2023 IEEE international conference on acoustics, speech and signal
processing (ICASSP), pp 1–5, https://​doi.​org/​10.​1109/​ICASS​P49357.​2023.​10095​473
Kajita S, Ohba N, Jinnouchi R et al (2017) A universal 3d voxel descriptor for solid-state material infor-
matics with deep convolutional neural networks. Sci Rep 7(16):991. https://​doi.​org/​10.​1038/​
s41598-​017-​17299-w
Kasaei SH (2021) Orthographicnet: a deep transfer learning approach for 3-d object recognition in open-
ended domains. IEEE/ASME Trans Mechatron 26(6):2910–2921. https://​doi.​org/​10.​1109/​TMECH.​
2020.​30484​33
Katzir O, Lischinski D, Cohen-Or D (2022) Shape-pose disentanglement using se(3)-equivariant vector
neurons. Computer Vision–ECCV 2022. Springer Nature Switzerland, Cham, pp 468–484
Ke Q, An S, Bennamoun M et al (2017) Skeletonnet: mining deep part features for 3-d action recognition.
IEEE Signal Process Lett 24(6):731–735. https://​doi.​org/​10.​1109/​LSP.​2017.​26903​39
Kim G, Park YS, Cho Y, et al (2020a) Mulran: Multimodal range dataset for urban place recognition. In:
2020 IEEE international conference on robotics and automation (ICRA), pp 6246–6253, https://​doi.​
org/​10.​1109/​ICRA4​0945.​2020.​91972​98
Kim S, Park J, Han B (2020b) Rotation-invariant local-to-global representation learning for 3d point cloud.
In: Larochelle H, Ranzato M, Hadsell R, et al (eds) Advances in neural information processing sys-
tems (NeurIPS), vol 33. Curran Associates, Inc., pp 8174–8185
Kim S, Jeong Y, Park C, et al (2022) SeLCA: Self-supervised learning of canonical axis. In: NeurIPS 2022
workshop on symmetry and geometry in neural representations
Köhler J, Klein L, Noe F (2020) Equivariant flows: exact likelihood generative learning for symmetric den-
sities. In: Proceedings of the 37th international conference on machine learning (ICML), proceedings
of machine learning research, vol 119. PMLR, pp 5361–5370
Kondor R (2018) N-body networks: a covariant hierarchical neural network architecture for learning atomic
potentials. arXiv preprint arXiv:​1803.​01588
Kondor R, Trivedi S (2018) On the generalization of equivariance and convolution in neural networks to the
action of compact groups. In: Proceedings of the 35th international conference on machine learning
(ICML), proceedings of machine learning research, vol 80. PMLR, pp 2747–2755
Kondor R, Lin Z, Trivedi S (2018) Clebsch-gordan nets: a fully fourier space spherical convolutional neural
network. In: Bengio S, Wallach H, Larochelle H et al (eds) Advances in neural information processing
systems (NeurIPS), vol 31. Curran Associates Inc, NewYork
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural net-
works. In: Pereira F, Burges C, Bottou L et al (eds) Advances in neural information processing sys-
tems (NIPS), vol 25. Curran Associates Inc, NewYork
Lähner Z, Rodola E, Bronstein MM, et al (2016) Shrec’16: matching of deformable shapes with topological
noise. Proc 3DOR 2(10.2312)
Lai K, Bo L, Ren X, et al (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE
international conference on robotics and automation (ICRA), pp 1817–1824, https://​doi.​org/​10.​1109/​
ICRA.​2011.​59803​82
Landrieu L, Simonovsky M (2018) Large-scale point cloud semantic segmentation with superpoint graphs.
In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4558–4567, https://​
doi.​org/​10.​1109/​CVPR.​2018.​00479
Le H (2021) Geometric invariance of pointnet. Bachelor’s thesis, Tampere University, Tampere, Finland
Le T, Noé F, Clevert DA (2022a) Equivariant graph attention networks for molecular property prediction.
arXiv preprint arXiv:​2202.​09891
Le T, Noe F, Clevert DA (2022b) Representation learning on biomolecular structures using equivariant
graph attention. In: Rieck B, Pascanu R (eds) Proceedings of the first learning on graphs conference,
proceedings of machine learning research, vol 198. PMLR, pp 30:1–30:17

13
Rotation invariance and equivariance in 3D deep learning: a… Page 43 of 52 168

Lei J, Deng C, Schmeckpeper K, et al (2023) Efem: equivariant neural field expectation maximization for 3d
object segmentation without scene supervision. In: 2023 IEEE/CVF conference on computer vision
and pattern recognition (CVPR), pp 4902–4912, https://​doi.​org/​10.​1109/​CVPR5​2729.​2023.​00475
Li C, Wei W, Li J et al (2021) 3dmol-net: learn 3d molecular representation using adaptive graph convolu-
tional network based on rotation invariance. IEEE J Biomed Health Inform. https://​doi.​org/​10.​1109/​
JBHI.​2021.​30891​62
Li F, Fujiwara K, Okura F, et al (2021b) A closer look at rotation-invariant deep point cloud analysis. In:
2021 IEEE/CVF international conference on computer vision (ICCV), pp 16,198–16,207, https://​doi.​
org/​10.​1109/​ICCV4​8922.​2021.​01591
Li J, Bi Y, Lee GH (2019a) Discrete rotation equivariance for point cloud recognition. In: 2019 International
conference on robotics and automation (ICRA), pp 7269–7275, https://​doi.​org/​10.​1109/​ICRA.​2019.​
87939​83
Li J, Luo S, Deng C, et al (2022a) Directed weight neural networks for protein structure representation
learning. https://​doi.​org/​10.​48550/​ARXIV.​2201.​13299
Li L, Zhu S, Fu H, et al (2020) End-to-end learning local multi-view descriptors for 3d point clouds. In:
2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1916–1925,
https://​doi.​org/​10.​1109/​CVPR4​2600.​2020.​00199
Li L, Kong X, Zhao X et al (2022) Rinet: efficient 3d lidar-based place recognition using rotation invariant
neural network. IEEE Robot Autom Lett 7(2):4321–4328. https://​doi.​org/​10.​1109/​LRA.​2022.​31504​
99
Li RW, Zhang LX, Li C, et al (2023a) E3sym: leveraging e(3) invariance for unsupervised 3d planar reflec-
tive symmetry detection. In: Proceedings of the IEEE/CVF international conference on computer
vision (ICCV), pp 14,543–14,553
Li X, Li R, Chen G et al (2021) A rotation-invariant framework for deep point cloud analysis. IEEE Trans
Visual Comput Graph. https://​doi.​org/​10.​1109/​TVCG.​2021.​30925​70
Li X, Weng Y, Yi L et al (2021) Leveraging se(3) equivariance for self-supervised category-level object
pose estimation from point clouds. In: Ranzato M, Beygelzimer A, Dauphin Y et al (eds) Advances
in neural information processing systems (NeurIPS), vol 34. Curran Associates Inc., NewYork, pp
15370–15381
Li X, Wu W, Fern XZ, et al (2023b) Improving the robustness of point convolution on k-nearest neighbor
neighborhoods with a viewpoint-invariant coordinate transform. In: 2023 IEEE/CVF winter confer-
ence on applications of computer vision (WACV), pp 1287–1297, https://​doi.​org/​10.​1109/​WACV5​
6688.​2023.​00134
Li Y, Gu C, Dullien T, et al (2019b) Graph matching networks for learning the similarity of graph structured
objects. In: Proceedings of the 36th international conference on machine learning (ICML), proceed-
ings of machine learning research, vol 97. PMLR, pp 3835–3845
Li Z, Yang Y, Faraggi E et al (2014) Direct prediction of profiles of sequences compatible with a protein
structure by neural networks with fragment-based local and energy-based nonlocal profiles. Proteins
Struct Funct Bioinf 82(10):2565–2573. https://​doi.​org/​10.​1002/​prot.​24620
Liao Y, Xie J, Geiger A (2022) Kitti-360: a novel dataset and benchmarks for urban scene understanding in
2d and 3d. IEEE Trans Pattern Anal Mach Intell. https://​doi.​org/​10.​1109/​TPAMI.​2022.​31795​07
Lin CE, Song J, Zhang R, et al (2022a) SE(3)-equivariant point cloud-based place recognition. In: 6th
Annual conference on robot learning
Lin CE, Song J, Zhang R, et al (2023a) Se(3)-equivariant point cloud-based place recognition. In: Liu K,
Kulic D, Ichnowski J (eds) Proceedings of The 6th conference on robot learning, proceedings of
machine learning research, vol 205. PMLR, pp 1520–1530
Lin CW, Chen TI, Lee HY, et al (2023b) Coarse-to-fine point cloud registration with se(3)-equivariant
representations. In: 2023 IEEE international conference on robotics and automation (ICRA), pp
2833–2840, https://​doi.​org/​10.​1109/​ICRA4​8891.​2023.​10161​141
Lin H, Huang Y, Liu M, et al (2022b) Diffbp: generative diffusion of 3d molecules for target protein
binding. arXiv preprint arXiv:​2211.​11214
Lin J, Li H, Chen K et al (2021) Sparse steerable convolutions: an efficient learning of se(3)-equivariant
features for estimation and tracking of object poses in 3d space. Advances in neural information
processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 16779–16790
Lin J, Rickert M, Knoll A (2021b) Deep hierarchical rotation invariance learning with exact geometry
feature representation for point cloud classification. In: 2021 IEEE international conference on
robotics and automation (ICRA), pp 9529–9535, https://​doi.​org/​10.​1109/​ICRA4​8506.​2021.​95613​
07
Liu D, Chen C, Xu C et al (2022) A robust and reliable point cloud recognition network under rigid
transformation. IEEE Trans Instrum Meas 71:1–13. https://​doi.​org/​10.​1109/​TIM.​2022.​31420​77

13
168 Page 44 of 52 J. Fei, Z. Deng

Liu M, Yao F, Choi C, et al (2019a) Deep learning 3d shapes using alt-az anisotropic 2-sphere convolu-
tion. In: International conference on learning representations (ICLR)
Liu S, Guo H, Tang J (2022b) Molecular geometry pretraining with se(3)-invariant denoising distance
matching. https://​doi.​org/​10.​48550/​ARXIV.​2206.​13602
Liu Y, Wang C, Song Z et al (2018) Efficient global point cloud registration by matching rotation invari-
ant features through translation search. Computer Vision–ECCV 2018. Springer International
Publishing, Cham, pp 460–474. https://​doi.​org/​10.​1007/​978-3-​030-​01258-8_​28
Liu Y, Fan B, Xiang S, et al (2019b) Relation-shape convolutional neural network for point cloud analy-
sis. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 8887–
8896, https://​doi.​org/​10.​1109/​CVPR.​2019.​00910
Liu Y, Hong W, Cao B (2022) Molnet-3d: deep learning of molecular representations and properties
from 3d topography. Adv Theory Simul 5(6):2200037. https://​doi.​org/​10.​1002/​adts.​20220​0037
Liu Z, Zhou S, Suo C, et al (2019c) Lpd-net: 3d point cloud learning for large-scale place recogni-
tion and environment analysis. In: 2019 IEEE/CVF international conference on computer vision
(ICCV), pp 2831–2840, https://​doi.​org/​10.​1109/​ICCV.​2019.​00292
Lohit S, Trivedi S (2020) Rotation-invariant autoencoders for signals on spheres. https://​doi.​org/​10.​
48550/​ARXIV.​2012.​04474
Lou Y, Ye Z, You Y et al (2023) Crin: rotation-invariant point cloud analysis and rotation estimation via
centrifugal reference frame. Proc AAAI Conf Artif Intell (AAAI) 37(2):1817–1825. https://​doi.​
org/​10.​1609/​aaai.​v37i2.​25271
Luo S, Li J, Guan J, et al (2022) Equivariant point cloud analysis via learning orientations for message
passing. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp
18,910–18,919, https://​doi.​org/​10.​1109/​CVPR5​2688.​2022.​01836
Maddern W, Pascoe G, Linegar C et al (2017) 1 year, 1000 km: the oxford robotcar dataset. Int J Robot
Res 36(1):3–15. https://​doi.​org/​10.​1177/​02783​64916​679498
Marcon M, Spezialetti R, Salti S et al (2021) Unsupervised learning of local equivariant descriptors for
point clouds. IEEE Trans Pattern Anal Mach Intell. https://​doi.​org/​10.​1109/​TPAMI.​2021.​31267​13
Maturana D, Scherer S (2015) Voxnet: a 3d convolutional neural network for real-time object recogni-
tion. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp
922–928, https://​doi.​org/​10.​1109/​IROS.​2015.​73534​81
McNitt-Gray MF, Armato SG, Meyer CR et al (2007) The lung image database consortium (lidc) data
collection process for nodule detection and annotation. Acad Radiol 14(12):1464–1474. https://​
doi.​org/​10.​1016/j.​acra.​2007.​07.​021
Mehta D, Rhodin H, Casas D, et al (2017) Monocular 3d human pose estimation in the wild using
improved cnn supervision. In: 2017 International conference on 3D Vision (3DV), pp 506–516,
https://​doi.​org/​10.​1109/​3DV.​2017.​00064
Mei G, Tang H, Huang X, et al (2023) Unsupervised deep probabilistic approach for partial point cloud
registration. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR),
pp 13,611–13,620, https://​doi.​org/​10.​1109/​CVPR5​2729.​2023.​01308
Melnyk P, Felsberg M, Wadenbäck M (2022) Steerable 3D spherical neurons. In: Proceedings of the
39th international conference on machine learning (ICML), proceedings of machine learning
research, vol 162. PMLR, pp 15,330–15,339
Melzi S, Spezialetti R, Tombari F, et al (2019) Gframes: gradient-based local reference frame for 3d
shape matching. In: 2019 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 4624–4633, https://​doi.​org/​10.​1109/​CVPR.​2019.​00476
Meng HY, Gao L, Lai YK, et al (2019) Vv-net: Voxel vae net with group convolutions for point cloud seg-
mentation. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 8499–8507,
https://​doi.​org/​10.​1109/​ICCV.​2019.​00859
Menze BH, Jakab A, Bauer S et al (2015) The multimodal brain tumor image segmentation benchmark
(brats). IEEE Trans Med Imaging 34(10):1993–2024. https://​doi.​org/​10.​1109/​TMI.​2014.​23776​94
Mo K, Zhu S, Chang AX, et al (2019) Partnet: a large-scale benchmark for fine-grained and hierarchical
part-level 3d object understanding. In: 2019 IEEE/CVF conference on computer vision and pattern
recognition (CVPR), pp 909–918, https://​doi.​org/​10.​1109/​CVPR.​2019.​00100
Moon J, Kim H, Lee B (2018) View-point invariant 3d classification for mobile robots using a con-
volutional neural network. Int J Control Autom Syst 16(6):2888–2895. https://​doi.​org/​10.​1007/​
s12555-​018-​0182-y
Mukhaimar A, Tennakoon R, Lai CY et al (2022) Robust object classification approach using spherical har-
monics. IEEE Access 10:21541–21553. https://​doi.​org/​10.​1109/​ACCESS.​2022.​31513​50

13
Rotation invariance and equivariance in 3D deep learning: a… Page 45 of 52 168

Novotny D, Ravi N, Graham B, et al (2019) C3dpo: canonical 3d pose networks for non-rigid structure from
motion. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 7687–7696,
https://​doi.​org/​10.​1109/​ICCV.​2019.​00778
Pan G, Liu P, Wang J et al (2019) 3dti-net: learn 3d transform-invariant feature using hierarchical graph cnn.
PRICAI 2019: trends in artificial intelligence. Springer International Publishing, Cham, pp 37–51.
https://​doi.​org/​10.​1007/​978-3-​030-​29911-8_4
Pan L, Cai Z, Liu Z (2021) Robust partial-to-partial point cloud registration in a full range. https://​doi.​org/​
10.​48550/​ARXIV.​2111.​15606
Park JY, Biza O, Zhao L, et al (2022) Learning symmetric embeddings for equivariant world models.
In: Proceedings of the 39th international conference on machine learning (ICML), proceedings of
machine learning research, vol 162. PMLR, pp 17,372–17,389
Paulhac L, Makris P, Ramel JY, et al (2009) A solid texture database for segmentation and classification
experiments. In: VISAPP (2), pp 135–141
Poiesi F, Boscaini D (2021) Distinctive 3d local deep descriptors. In: 2020 25th international conference on
pattern recognition (ICPR), pp 5720–5727, https://​doi.​org/​10.​1109/​ICPR4​8806.​2021.​94119​78
Poiesi F, Boscaini D (2023) Learning general and distinctive 3d local deep descriptors for point cloud regis-
tration. IEEE Trans Pattern Anal Mach Intell 45(3):3979–3985. https://​doi.​org/​10.​1109/​TPAMI.​2022.​
31753​71
Pomerleau F, Liu M, Colas F et al (2012) Challenging data sets for point cloud registration algorithms. Int J
Robot Res 31(14):1705–1711. https://​doi.​org/​10.​1177/​02783​64912​458814
Pop A, Domşa V, Tamas L (2023) Rotation invariant graph neural network for 3d point clouds. Remote
Sens. https://​doi.​org/​10.​3390/​rs150​51437
Poulenard A, Guibas LJ (2021) A functional approach to rotation equivariant non-linearities for tensor field
networks. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp
13,169–13,178, https://​doi.​org/​10.​1109/​CVPR4​6437.​2021.​01297
Poulenard A, Rakotosaona MJ, Ponty Y, et al (2019) Effective rotation-invariant point cnn with spherical
harmonics kernels. In: 2019 International conference on 3D vision (3DV), pp 47–56, https://​doi.​org/​
10.​1109/​3DV.​2019.​00015
Pujol-Miró A, Casas JR, Ruiz-Hidalgo J (2019) Correspondence matching in unorganized 3d point clouds
using convolutional neural networks. Image Vis Comput 83–84:51–60. https://​doi.​org/​10.​1016/j.​ima-
vis.​2019.​02.​013
Puny O, Atzmon M, Smith EJ, et al (2022) Frame averaging for invariant and equivariant network design.
In: International conference on learning representations (ICLR)
Qi CR, Su H, Nießner M, et al (2016) Volumetric and multi-view cnns for object classification on 3d data.
In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 5648–5656,
https://​doi.​org/​10.​1109/​CVPR.​2016.​609
Qi CR, Su H, Kaichun M, et al (2017a) Pointnet: deep learning on point sets for 3d classification and seg-
mentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 77–85,
https://​doi.​org/​10.​1109/​CVPR.​2017.​16
Qi CR, Yi L, Su H et al (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric
space. Advances in neural information processing systems (NIPS), vol 30. Curran Associates Inc,
New York
Qin S, Zhang X, Xu H et al (2022) Fast quaternion product units for learning disentangled representations in
𝕊𝕆(3). IEEE Trans Pattern Anal Mach Intell. https://​doi.​org/​10.​1109/​TPAMI.​2022.​32022​17
Qin S, Li Z, Liu L (2023a) Robust 3d shape classification via non-local graph attention network. In: 2023
IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5374–5383, https://​
doi.​org/​10.​1109/​CVPR5​2729.​2023.​00520
Qin Z, Yu H, Wang C et al (2023) Geotransformer: fast and robust point cloud registration with geomet-
ric transformer. IEEE Trans Pattern Anal Mach Intell 45(8):9806–9821. https://​doi.​org/​10.​1109/​
TPAMI.​2023.​32590​38
Qiu Z, Li Y, Wang Y et al (2022) Spe-net: boosting point cloud analysis via rotation robustness enhance-
ment. Computer Vision–ECCV 2022. Springer Nature Switzerland, Cham, pp 593–609
Ramakrishnan R, Dral PO, Rupp M et al (2014) Quantum chemistry structures and properties of 134
kilo molecules. Sci Data 1(1):140,022. https://​doi.​org/​10.​1038/​sdata.​2014.​22
Rao Y, Lu J, Zhou J (2019) Spherical fractal convolutional neural networks for point cloud recognition.
In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 452–460,
https://​doi.​org/​10.​1109/​CVPR.​2019.​00054
Rasp S, Dueben PD, Scher S et al (2020) Weatherbench: a benchmark data set for data-driven weather
forecasting. J Adv Model Earth Syst 12(11):e2020MS002. https://​doi.​org/​10.​1029/​2020M​S0022​03

13
168 Page 46 of 52 J. Fei, Z. Deng

Roveri R, Rahmann L, Öztireli AC, et al (2018) A network architecture for point cloud classification via
automatic depth images generation. In: 2018 IEEE/CVF conference on computer vision and pat-
tern recognition, pp 4176–4184, https://​doi.​org/​10.​1109/​CVPR.​2018.​00439
Rupp M, Tkatchenko A, Müller KR et al (2012) Fast and accurate modeling of molecular atomiza-
tion energies with machine learning. Phys Rev Lett 108(058):301. https://​doi.​org/​10.​1103/​PhysR​
evLett.​108.​058301
Rusu RB, Blodow N, Beetz M (2009) Fast point feature histograms (fpfh) for 3d registration. In: 2009
IEEE international conference on robotics and automation (ICRA), pp 3212–3217, https://​doi.​org/​
10.​1109/​ROBOT.​2009.​51524​73
Sahin YH, Mertan A, Unal G (2022) Odfnet: using orientation distribution functions to characterize 3d
point clouds. Comput Graph 102:610–618. https://​doi.​org/​10.​1016/j.​cag.​2021.​08.​016
Sajnani R, Poulenard A, Jain J, et al (2022) Condor: self-supervised canonicalization of 3d pose for par-
tial shapes. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR),
pp 16,948–16,958, https://​doi.​org/​10.​1109/​CVPR5​2688.​2022.​01646
Salihu D, Steinbach E (2023) Sgpcr: spherical gaussian point cloud representation and its application to
object registration and retrieval. In: 2023 IEEE/CVF winter conference on applications of com-
puter vision (WACV), pp 572–581, https://​doi.​org/​10.​1109/​WACV5​6688.​2023.​00064
Satorras VG, Hoogeboom E, Fuchs F et al (2021) E(n) equivariant normalizing flows. Advances in
neural information processing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp
4181–4192
Satorras VG, Hoogeboom E, Welling M (2021b) E(n) equivariant graph neural networks. In: Proceed-
ings of the 38th international conference on machine learning (ICML), proceedings of machine
learning research, vol 139. PMLR, pp 9323–9332
Savva M, Yu F, Su H, et al (2017) Large-scale 3d shape retrieval from shapenet core55: Shrec’17 track.
In: Proceedings of the workshop on 3D object retrieval. Eurographics Association, Goslar, DEU,
3Dor ’17, pp 39–50, https://​doi.​org/​10.​2312/​3dor.​20171​050
Schneuing A, Du Y, Harris C, et al (2022) Structure-based drug design with equivariant diffusion mod-
els. https://​doi.​org/​10.​48550/​ARXIV.​2210.​13695
Schütt K, Kindermans PJ, Sauceda Felix HE et al (2017) Schnet: a continuous-filter convolutional neu-
ral network for modeling quantum interactions. In: Guyon I, Luxburg UV, Bengio S et al (eds)
Advances in neural information processing systems (NIPS), vol 30. Curran Associates Inc
Schütt K, Unke O, Gastegger M (2021) Equivariant message passing for the prediction of tensorial prop-
erties and molecular spectra. In: Proceedings of the 38th international conference on machine
learning (ICML), proceedings of machine learning research, vol 139. PMLR, pp 9377–9388
Schütt KT, Sauceda HE, Kindermans PJ et al (2018) Schnet–A deep learning architecture for molecules
and materials. J Chem Phys 148(24):241722. https://​doi.​org/​10.​1063/1.​50197​79
Shahroudy A, Liu J, Ng TT, et al (2016) Ntu rgb+d: a large scale dataset for 3d human activity analysis.
In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1010–1019,
https://​doi.​org/​10.​1109/​CVPR.​2016.​115
Shakerinava M, Ravanbakhsh S (2021) Equivariant networks for pixelized spheres. In: Proceedings of
the 38th international conference on machine learning (ICML), proceedings of machine learning
research, vol 139. PMLR, pp 9477–9488
Shan Z, Yang Q, Ye R, et al (2023) Gpa-net:no-reference point cloud quality assessment with multi-
task graph convolutional network. IEEE Trans Vis Comput Graph. https://​doi.​org/​10.​1109/​TVCG.​
2023.​32828​02
Shen W, Zhang B, Huang S et al (2020) 3d-rotation-equivariant quaternion neural networks. Computer
Vision–ECCV 2020. Springer International Publishing, Cham, pp 531–547. https://​doi.​org/​10.​
1007/​978-3-​030-​58565-5_​32
Shen Z, Hong T, She Q, et al (2022) PDO-s3DCNNs: artial differential operator based steerable 3D
CNNs. In: Proceedings of the 39th international conference on machine learning (ICML), pro-
ceedings of machine learning research, Vol 162. PMLR, pp 19827–19846
Shi B, Bai S, Zhou Z et al (2015) Deeppano: deep panoramic representation for 3-d shape recognition.
IEEE Signal Process Lett 22(12):85. https://​doi.​org/​10.​1109/​LSP.​2015.​24808​02
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud.
In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 770–779,
https://​doi.​org/​10.​1109/​CVPR.​2019.​00086
Shotton J, Glocker B, Zach C, et al (2013) Scene coordinate regression forests for camera relocaliza-
tion in rgb-d images. In: 2013 IEEE conference on computer vision and pattern recognition, pp
2930–2937, https://​doi.​org/​10.​1109/​CVPR.​2013.​377

13
Rotation invariance and equivariance in 3D deep learning: a… Page 47 of 52 168

Siddani B, Balachandar S, Fang R (2021) Rotational and reflectional equivariant convolutional neural
network for data-limited applications: multiphase flow demonstration. Phys Fluids 33(10):103323.
https://​doi.​org/​10.​1063/5.​00660​49
Simeonov A, Du Y, Tagliasacchi A, et al (2022) Neural descriptor fields: Se(3)-equivariant object repre-
sentations for manipulation. In: 2022 international conference on robotics and automation (ICRA),
pp 6394–6400, https://​doi.​org/​10.​1109/​ICRA4​6639.​2022.​98121​46
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition.
https://​doi.​org/​10.​48550/​ARXIV.​1409.​1556
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In:
2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 567–576, https://​
doi.​org/​10.​1109/​CVPR.​2015.​72986​55
Spezialetti R, Salti S, Stefano LD (2019) Learning an effective equivariant 3d descriptor without super-
vision. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6400–6409,
https://​doi.​org/​10.​1109/​ICCV.​2019.​00650
Spezialetti R, Stella F, Marcon M et al (2020) Learning to orient surfaces by self-supervised spherical
cnns. In: Larochelle H, Ranzato M, Hadsell R et al (eds) Advances in neural information process-
ing systems (NeurIPS), vol 33. Curran Associates Inc, NewYork, pp 5381–5392
Stärk H, Ganea OE, Pattanaik L, et al (2022) Equibind: geometric deep learning for drug binding struc-
ture prediction. In: ICLR 2022 workshop on geometrical and topological representation learning
Su H, Maji S, Kalogerakis E, et al (2015) Multi-view convolutional neural networks for 3d shape recog-
nition. In: 2015 IEEE international conference on computer vision (ICCV), pp 945–953, https://​
doi.​org/​10.​1109/​ICCV.​2015.​114
Subramanian G, Ramsundar B, Pande V et al (2016) Computational modeling of ß-secretase 1 (bace-1)
inhibitors using ligand based approaches. J Chem Inf Model 56(10):1936–1949. https://​doi.​org/​10.​
1021/​acs.​jcim.​6b002​90
Suk J, de Haan P, Lippe P, et al (2021) Equivariant graph neural networks as surrogate for computational
fluid dynamics in 3d artery models. In: Fourth workshop on machine learning and the physical sci-
ences (NeurIPS 2021)
Suk J, Haan Pd, Lippe P et al (2022) Mesh convolutional neural networks for wall shear stress estima-
tion in 3d artery models. Statistical atlases and computational models of the heart. Multi-disease,
multi-view, and multi-center right ventricular segmentation in cardiac MRI challenge. Springer,
Cham, pp 93–102. https://​doi.​org/​10.​1007/​978-3-​030-​93722-5_​11
Sun T, Liu M, Ye H et al (2019) Point-cloud-based place recognition using CNN feature extraction.
IEEE Sens J 19(24):12175–12186. https://​doi.​org/​10.​1109/​JSEN.​2019.​29377​40
Sun W, Tagliasacchi A, Deng B et al (2021) Canonical capsules: self-supervised capsules in canonical
pose. In: Ranzato M, Beygelzimer A, Dauphin Y et al (eds) Advances in neural information pro-
cessing systems (NeurIPS), vol 34. Curran Associates Inc, NewYork, pp 24993–25005
Sun X, Wei Y, Liang S, et al (2015) Cascaded hand pose regression. In: 2015 IEEE conference on com-
puter vision and pattern recognition (CVPR), pp 824–832, https://​doi.​org/​10.​1109/​CVPR.​2015.​
72986​83
Sun X, Lian Z, Xiao J (2019b) Srinet: Learning strictly rotation-invariant representations for point cloud
classification and segmentation. In: Proceedings of the 27th ACM international conference on
multimedia (ACM MM). Association for computing machinery, New York, MM ’19, pp 980–988,
https://​doi.​org/​10.​1145/​33430​31.​33510​42
Sun X, Huang Y, Lian Z (2023) Learning isometry-invariant representations for point cloud analysis.
Pattern Recogn 134(109):087. https://​doi.​org/​10.​1016/j.​patcog.​2022.​109087
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on
computer vision and pattern recognition (CVPR), pp 1–9, https://​doi.​org/​10.​1109/​CVPR.​2015.​
72985​94
Tabib RA, Upasi N, Anvekar T, et al (2023) Ipd-net: so(3) invariant primitive decompositional network for
3d point clouds. In: 2023 IEEE/CVF conference on computer vision and pattern recognition work-
shops (CVPRW), pp 2736–2744, https://​doi.​org/​10.​1109/​CVPRW​59228.​2023.​00274
Tang D, Chang HJ, Tejani A, et al (2014) Latent regression forest: structured estimation of 3d articulated
hand posture. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp
3786–3793, https://​doi.​org/​10.​1109/​CVPR.​2014.​490
Tao Z, Zhu Y, Wei T et al (2021) Multi-head attentional point cloud classification and segmentation using
strictly rotation-invariant representations. IEEE Access 9:71,133-71,144. https://​doi.​org/​10.​1109/​
ACCESS.​2021.​30792​95
Team NLSTR (2011) Reduced lung-cancer mortality with low-dose computed tomographic screening. N
Engl J Med 365(5):395–409

13
168 Page 48 of 52 J. Fei, Z. Deng

Thölke P, Fabritiis GD (2022) Equivariant transformers for neural network based molecular potentials. In:
International conference on learning representations (ICLR)
Thomas NC (2019) Euclidean-equivariant functions on three-dimensional point clouds. PhD thesis, Stan-
ford University
Thomas NC, Smidt T, Kearnes S, et al (2018) Tensor field networks: rotation- and translation-equivariant
neural networks for 3d point clouds. https://​doi.​org/​10.​48550/​ARXIV.​1802.​08219
Tombari F, Salti S, Di Stefano L (2010) Unique signatures of histograms for local surface description. Com-
puter Vision–ECCV 2010. Springer, Berlin, pp 356–369. https://​doi.​org/​10.​1007/​978-3-​642-​15558-
1_​26
Tompson J, Stein M, Lecun Y, et al (2014) Real-time continuous pose recovery of human hands using con-
volutional networks. ACM Trans Graph 33(5). https://​doi.​org/​10.​1145/​26295​00
Townshend R, Bedi R, Suriana P et al (2019) End-to-end learning on 3d protein structure for interface pre-
diction. Advances in neural information processing systems (NeurIPS), vol 32. Curran Associates Inc,
NewYork
Townshend RJL, Vögele M, Suriana PA, et al (2021) Atom3d: tasks on molecules in three dimensions. In:
Thirty-fifth conference on neural information processing systems datasets and benchmarks track
Uy MA, Pham QH, Hua BS, et al (2019) Revisiting point cloud classification: a new benchmark dataset and
classification model on real-world data. In: 2019 IEEE/CVF international conference on computer
vision (ICCV), pp 1588–1597, https://​doi.​org/​10.​1109/​ICCV.​2019.​00167
Villar S, Hogg DW, Storey-Fisher K et al (2021) Scalars are universal: equivariant machine learning, struc-
tured like classical physics. Advances in neural information processing systems (NeurIPS), vol 34.
Curran Associates Inc, NewYork, pp 28848–28863
Vreven T, Moal IH, Vangone A et al (2015) Updates to the integrated protein-protein interaction bench-
marks: docking benchmark version 5 and affinity benchmark version 2. J Mol Biol 427(19):3031–
3041. https://​doi.​org/​10.​1016/j.​jmb.​2015.​07.​016
Wang C, Pelillo M, Siddiqi K (2017) Dominant set clustering and pooling for multi-view 3d object rec-
ognition. In: Tae-Kyun Kim GBStefanos Zafeiriou, Mikolajczyk K (eds) Proceedings of the British
Machine Vision Conference (BMVC). BMVA Press, pp 64.1–64.12, https://​doi.​org/​10.​5244/C.​31.​64
Wang H, Sridhar S, Huang J, et al (2019a) Normalized object coordinate space for category-level 6d object
pose and size estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 2637–2646, https://​doi.​org/​10.​1109/​CVPR.​2019.​00275
Wang H, Liu Y, Dong Z, et al (2022a) You only hypothesize once: point cloud registration with rotation-
equivariant descriptors. In: Proceedings of the 30th ACM international conference on multimedia
(ACM MM). Association for Computing Machinery, New York, NY, USA, MM ’22, pp 1630–1641,
https://​doi.​org/​10.​1145/​35031​61.​35480​23
Wang H, Liu Y, Hu Q et al (2023) Roreg: pairwise point cloud registration with oriented descriptors and
local rotations. IEEE Trans Pattern Anal Mach Intell 45(8):10376–10393. https://​doi.​org/​10.​1109/​
TPAMI.​2023.​32449​51
Wang J, Chakraborty R, Yu SX (2021) Spatial transformer for 3d point clouds. IEEE Trans Pattern Anal
Mach Intell. https://​doi.​org/​10.​1109/​TPAMI.​2021.​30703​41
Wang L, Liu Y, Lin Y, et al (2022b) ComENet: towards complete and efficient message passing for 3d
molecular graphs. In: Advances in neural information processing systems (NeurIPS)
Wang X, Lei J, Lan H, et al (2023b) Dueqnet: dual-equivariance network in outdoor 3d object detection for
autonomous driving. In: 2023 IEEE International conference on robotics and automation (ICRA), pp
6951–6957, https://​doi.​org/​10.​1109/​ICRA4​8891.​2023.​10161​353
Wang Y, Sun Y, Liu Z et al (2019) Dynamic graph CNN for learning on point clouds. ACM Trans Graph.
https://​doi.​org/​10.​1145/​33263​62
Wang Y, Zhao Y, Ying S et al (2022) Rotation-invariant point cloud representation for 3-d model recogni-
tion. IEEE Trans Cybern. https://​doi.​org/​10.​1109/​TCYB.​2022.​31575​93
Wang Y, Wang J, Qu Y, et al (2023c) Rip-nerf: learning rotation-invariant point-based neural radiance field
for fine-grained editing and compositing. In: Proceedings of the 2023 ACM international conference
on multimedia retrieval. Association for computing machinery, New York, NY, USA, ICMR ’23, p
125-134, https://​doi.​org/​10.​1145/​35911​06.​35922​76
Wang Z, Rosen D (2023) Manufacturing process classification based on distance rotationally invariant con-
volutions. J Comput Inf Sci Eng 23(5):051,004. https://​doi.​org/​10.​1115/1.​40568​06
Wei X, Yu R, Sun J (2020) View-gcn: view-based graph convolutional network for 3d shape analysis. In:
2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 1847–1856,
https://​doi.​org/​10.​1109/​CVPR4​2600.​2020.​00192
Wei X, Yu R, Sun J (2022) Learning view-based graph convolutional network for multi-view 3d shape anal-
ysis. IEEE Trans Pattern Anal Mach Intell 25:1–17. https://​doi.​org/​10.​1109/​TPAMI.​2022.​32217​85

13
Rotation invariance and equivariance in 3D deep learning: a… Page 49 of 52 168

Weihsbach C, Hansen L, Heinrich M (2022) Xedgeconv: leveraging graph convolutions for efficient, permu-
tation- and rotation-invariant dense 3d medical image segmentation. In: Proceedings of the first inter-
national workshop on geometric deep learning in medical image analysis, Proceedings of machine
learning research, vol 194. PMLR, pp 61–71
Weiler M, Geiger M, Welling M et al (2018) 3d steerable CNNS: learning rotationally equivariant features
in volumetric data. Advances in neural information processing systems (NeurIPS), vol 31. Curran
Associates Inc, NewYork
Winkels M, Cohen TS (2018) 3d g-cnns for pulmonary nodule detection. In: Medical imaging with deep
learning (MIDL)
Winkels M, Cohen TS (2019) Pulmonary nodule detection in CT scans with equivariant CNNS. Med Image
Anal 55:15–26. https://​doi.​org/​10.​1016/j.​media.​2019.​03.​010
Winter R, Bertolini M, Le T, et al (2022) Unsupervised learning of group invariant and equivariant repre-
sentations. In: Advances in neural information processing systems (NeurIPS)
Worrall D, Brostow G (2018) Cubenet: equivariance to 3d rotation and translation. Computer Vision–ECCV
2018. Springer International Publishing, Cham, pp 585–602. https://​doi.​org/​10.​1007/​978-3-​030-​
01228-1_​35
Wu H, Miao Y (2022) So(3) rotation equivariant point cloud completion using attention-based vector neu-
rons. In: 2022 International Conference on 3D Vision (3DV), pp 280–290, https://​doi.​org/​10.​1109/​
3DV57​658.​2022.​00040
Wu W, Qi Z, Fuxin L (2019) Pointconv: deep convolutional networks on 3d point clouds. In: 2019 IEEE/
CVF conference on computer vision and pattern recognition (CVPR), pp 9613–9622, https://​doi.​org/​
10.​1109/​CVPR.​2019.​00985
Wu Z, Song S, Khosla A, et al (2015) 3d shapenets: a deep representation for volumetric shapes. In: 2015
IEEE conference on computer vision and pattern recognition (CVPR), pp 1912–1920, https://​doi.​org/​
10.​1109/​CVPR.​2015.​72988​01
Xiang Y, Kim W, Chen W et al (2016) Objectnet3d: a large scale database for 3d object recognition. Com-
puter Vision–ECCV 2016. Springer International Publishing, Cham, pp 160–176. https://​doi.​org/​10.​
1007/​978-3-​319-​46484-8_​10
Xiao C, Wachs J (2021) Triangle-net: towards robustness in point cloud learning. In: 2021 IEEE winter con-
ference on applications of computer vision (WACV), pp 826–835, https://​doi.​org/​10.​1109/​WACV4​
8630.​2021.​00087
Xiao J, Owens A, Torralba A (2013) Sun3d: a database of big spaces reconstructed using sfm and object
labels. In: 2013 IEEE international conference on computer vision (ICCV), pp 1625–1632, https://​
doi.​org/​10.​1109/​ICCV.​2013.​458
Xiao Z, Lin H, Li R, et al (2020) Endowing deep 3d models with rotation invariance based on principal
component analysis. In: 2020 IEEE international conference on multimedia and expo (ICME), pp
1–6, https://​doi.​org/​10.​1109/​ICME4​6284.​2020.​91029​47
Xie L, Yang Y, Wang W, et al (2023) General rotation invariance learning for point clouds via weight-
feature alignment. https://​doi.​org/​10.​48550/​arXiv.​2302.​09907
Xu C, Chen S, Li M et al (2021) Invariant teacher and equivariant student for unsupervised 3d human pose
estimation. Proc AAAI Conf Artif Intell (AAAI) 35(4):3013–3021. https://​doi.​org/​10.​1609/​aaai.​
v35i4.​16409
Xu J, Tang X, Zhu Y, et al (2021b) Sgmnet: Learning rotation-invariant point cloud representations via
sorted gram matrix. In: 2021 IEEE/CVF International conference on computer vision (ICCV), pp
10,448–10,457, https://​doi.​org/​10.​1109/​ICCV4​8922.​2021.​01030
Xu J, Yang Q, Li C, et al (2022) Rotation-equivariant graph convolutional networks for spherical data via
global-local attention. In: 2022 IEEE International conference on image processing (ICIP), pp 2501–
2505, https://​doi.​org/​10.​1109/​ICIP4​6576.​2022.​98975​10
Xu M, Zhou Z, Qiao Y (2020) Geometry sharing network for 3d point cloud classification and segmen-
tation. Proc AAAI Conf Artif Intell (AAAI) 34(07):12500–12507. https://​doi.​org/​10.​1609/​aaai.​
v34i07.​6938
Xu X, Yin H, Chen Z et al (2021) Disco: differentiable scan context with orientation. IEEE Robot cs
Autom Lett 6(2):2791–2798. https://​doi.​org/​10.​1109/​LRA.​2021.​30607​41
Xu X, Lu S, Wu J et al (2023) Ring++: Roto-translation invariant gram for global localization on a
sparse scan map. IEEE Trans Rob 39(6):4616–4635. https://​doi.​org/​10.​1109/​TRO.​2023.​33030​35
Xu Z, Liu K, Chen K et al (2023) Classification of single-view object point clouds. Pattern Recogn
135(109):137. https://​doi.​org/​10.​1016/j.​patcog.​2022.​109137
Yang F, Wang H, Jin Z (2021) Adaptive gmm convolution for point cloud learning. In: Proceedings of
the British machine vision conference (BMVC), BMVA Press

13
168 Page 50 of 52 J. Fei, Z. Deng

Yang L, Chakraborty R (2020) An “augmentation-free” rotation invariant classification scheme on


point-cloud and its application to neuroimaging. In: 2020 IEEE 17th international symposium on
biomedical imaging (ISBI), pp 713–716, https://​doi.​org/​10.​1109/​ISBI4​5749.​2020.​90986​70
Yang L, Chakraborty R, Yu SX (2019) Poirot: a rotation invariant omni-directional pointnet. https://​doi.​
org/​10.​48550/​ARXIV.​1910.​13050
Yang Q, Li C, Dai W, et al (2020) Rotation equivariant graph convolutional network for spherical
image classification. In: 2020 IEEE/CVF conference on computer vision and pattern recognition
(CVPR), pp 4302–4311, https://​doi.​org/​10.​1109/​CVPR4​2600.​2020.​00436
Yang Y, Feng C, Shen Y, et al (2018) Foldingnet: Point cloud auto-encoder via deep grid deformation.
In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 206–215, https://​
doi.​org/​10.​1109/​CVPR.​2018.​00029
Yi L, Kim VG, Ceylan D et al (2016) A scalable active framework for region annotation in 3d shape col-
lections. ACM Trans Graph. https://​doi.​org/​10.​1145/​29801​79.​29802​38
Yin P, Wang F, Egorov A, et al (2020) Seqspherevlad: sequence matching enhanced orientation-invariant
place recognition. In: 2020 IEEE/RSJ International conference on intelligent robots and systems
(IROS), pp 5024–5029, https://​doi.​org/​10.​1109/​IROS4​5743.​2020.​93417​27
Yin P, Xu L, Feng Z et al (2021) Pse-match: a viewpoint-free place recognition method with parallel
semantic embedding. IEEE Trans Intell Transp Syst. https://​doi.​org/​10.​1109/​TITS.​2021.​31024​29
Yin P, Wang F, Egorov A et al (2022) Fast sequence-matching enhanced viewpoint-invariant 3-d place
recognition. IEEE Trans Industr Electron 69(2):2127–2135. https://​doi.​org/​10.​1109/​TIE.​2021.​
30570​25
You H, Feng Y, Ji R, et al (2018) Pvnet: a joint convolutional network of point cloud and multi-view for
3d shape recognition. In: Proceedings of the 26th ACM international conference on multimedia
(ACM MM). Association for computing machinery, New York, NY, USA, MM ’18, pp 1310–
1318, https://​doi.​org/​10.​1145/​32405​08.​32407​02
You Y, Lou Y, Liu Q et al (2020) Pointwise rotation-invariant network with adaptive sampling and 3d
spherical voxel convolution. Proc AAAI Conf Artif Intell (AAAI) 34(07):12717–12724. https://​
doi.​org/​10.​1609/​aaai.​v34i07.​6965
You Y, Lou Y, Shi R et al (2021) Prin/sprin: on extracting point-wise rotation invariant features. IEEE
Trans Pattern Anal Mach Intell. https://​doi.​org/​10.​1109/​TPAMI.​2021.​31305​90
Yu H, Qin Z, Hou J, et al (2023) Rotation-invariant transformer for point cloud matching. In: 2023 IEEE/
CVF conference on computer vision and pattern recognition (CVPR), pp 5384–5393, https://​doi.​
org/​10.​1109/​CVPR5​2729.​2023.​00521
Yu HX, Wu J, Yi L (2022) Rotationally equivariant 3d object detection. In: 2022 IEEE/CVF confer-
ence on computer vision and pattern recognition (CVPR), pp 1446–1454, https://​doi.​org/​10.​1109/​
CVPR5​2688.​2022.​00151
Yu R, Wei X, Tombari F et al (2020) Deep positional and relational feature learning for rotation-invari-
ant point cloud analysis. Computer Vision–ECCV 2020. Springer International Publishing, Cham,
pp 217–233. https://​doi.​org/​10.​1007/​978-3-​030-​58607-2_​13
Yu T, Meng J, Yuan J (2018) Multi-view harmonized bilinear network for 3d object recognition. In:
2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 186–194,
https://​doi.​org/​10.​1109/​CVPR.​2018.​00027
Yu Y, Huang Z, Li F et al (2020) Point encoder GAN: a deep learning model for 3d point cloud inpaint-
ing. Neurocomputing 384:192–199. https://​doi.​org/​10.​1016/j.​neucom.​2019.​12.​032
Yuan W, Held D, Mertz C, et al (2018) Iterative transformer network for 3d point cloud. https://​doi.​org/​
10.​48550/​ARXIV.​1811.​11209
Yun K, Honorio J, Chattopadhyay D, et al (2012) Two-person interaction detection using body-pose features
and multiple instance learning. In: 2012 IEEE computer society conference on computer vision and
pattern recognition workshops, pp 28–35, https://​doi.​org/​10.​1109/​CVPRW.​2012.​62392​34
Zeng A, Song S, Nießner M, et al (2017) 3dmatch: learning local geometric descriptors from rgb-d recon-
structions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 199–
208, https://​doi.​org/​10.​1109/​CVPR.​2017.​29
Zhang C, Budvytis I, Liwicki S et al (2021) Rotation equivariant orientation estimation for omnidirectional
localization. Computer Vision - ACCV 2020. Springer International Publishing, Cham, pp 334–350.
https://​doi.​org/​10.​1007/​978-3-​030-​69538-5_​21
Zhang D, He F, Tu Z et al (2020) Pointwise geometric and semantic learning network on 3d point clouds.
Integr Comput-Aided Eng 27:57–75. https://​doi.​org/​10.​3233/​ICA-​190608
Zhang D, Yu J, Zhang C et al (2023) Parot: patch-wise rotation-invariant network via feature disentangle-
ment and pose restoration. Proc AAAI Conf Artif Intell (AAAI) 37(3):3418–3426. https://​doi.​org/​10.​
1609/​aaai.​v37i3.​25450

13
Rotation invariance and equivariance in 3D deep learning: a… Page 51 of 52 168

Zhang J, Yu MY, Vasudevan R, et al (2020b) Learning rotation-invariant representations of point clouds


using aligned edge convolutional neural networks. In: 2020 International conference on 3D Vision
(3DV), pp 200–209, https://​doi.​org/​10.​1109/​3DV50​981.​2020.​00030
Zhang L, Sun J, Zheng Q (2018) 3d point cloud recognition based on a multi-view convolutional neural
network. Sensors. https://​doi.​org/​10.​3390/​s1811​3681
Zhang S, Cao H, Liu Y, et al (2021b) Sn-graph: a minimalist 3d object representation for classification.
In: 2021 IEEE international conference on multimedia and expo (ICME), pp 1–6, https://​doi.​org/​10.​
1109/​ICME5​1207.​2021.​94284​49
Zhang T (2021) Spherical-gmm: a rotation and scale invariant method for point cloud classification. In:
2021 2nd international conference on intelligent computing and human-computer interaction
(ICHCI), pp 156–161, https://​doi.​org/​10.​1109/​ICHCI​54629.​2021.​00040
Zhang X, Wang L, Helwig J, et al (2023b) Artificial intelligence for science in quantum, atomistic, and con-
tinuum systems. https://​doi.​org/​10.​48550/​arXiv.​2307.​08423
Zhang Y, Lu Z, Xue JH, et al (2019a) A new rotation-invariant deep network for 3d object recognition. In:
2019 IEEE international conference on multimedia and expo (ICME), pp 1606–1611, https://​doi.​org/​
10.​1109/​ICME.​2019.​00277
Zhang Y, Zhang W, Li J (2023) Partial-to-partial point cloud registration by rotation invariant features and
spatial geometric consistency. Remote Sens. https://​doi.​org/​10.​3390/​rs151​23054
Zhang Z, Rebecq H, Forster C, et al (2016) Benefit of large field-of-view cameras for visual odometry. In:
2016 IEEE international conference on robotics and automation (ICRA), pp 801–808, https://​doi.​org/​
10.​1109/​ICRA.​2016.​74872​10
Zhang Z, Hua BS, Rosen DW, et al (2019b) Rotation invariant convolutions for 3d point clouds deep learn-
ing. In: 2019 International conference on 3D vision (3DV), pp 204–213, https://​doi.​org/​10.​1109/​3DV.​
2019.​00031
Zhang Z, Hua BS, Chen W, et al (2020c) Global context aware convolutions for 3d point cloud understand-
ing. In: 2020 international conference on 3D vision (3DV), pp 210–219, https://​doi.​org/​10.​1109/​
3DV50​981.​2020.​00031
Zhang Z, Wang X, Zhang Z, et al (2021c) Revisiting transformation invariant geometric deep learning: are
initial representations all you need? https://​doi.​org/​10.​48550/​ARXIV.​2112.​12345
Zhang Z, Hua BS, Yeung SK (2022) Riconv++: effective rotation invariant convolutions for 3d point clouds
deep learning. Int J Comput Vis. https://​doi.​org/​10.​1007/​s11263-​022-​01601-z
Zhao C, Yang J, Xiong X et al (2022) Rotation invariant point cloud analysis: where local geometry meets
global topology. Pattern Recogn 127(108):626. https://​doi.​org/​10.​1016/j.​patcog.​2022.​108626
Zhao H, Liang Z, Wang C et al (2021) Centroidreg: a global-to-local framework for partial point cloud reg-
istration. IEEE Robot Autom Lett 6(2):2533–2540. https://​doi.​org/​10.​1109/​LRA.​2021.​30613​69
Zhao H, Zhuang H, Wang C et al (2022) G3doa: generalizable 3d descriptor with overlap attention for point
cloud registration. IEEE Robot Autom Lett 7(2):2541–2548. https://​doi.​org/​10.​1109/​LRA.​2022.​
31427​33
Zhao Y, Birdal T, Lenssen JE et al (2020) Quaternion equivariant capsule networks for 3d point clouds.
Computer Vision–ECCV 2020. Springer International Publishing, Cham, pp 1–19. https://​doi.​org/​10.​
1007/​978-3-​030-​58452-8_1
Zhao Y, Wu Y, Chen C, et al (2020b) On isometry robustness of deep 3d point cloud models under adver-
sarial attacks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR),
pp 1198–1207, https://​doi.​org/​10.​1109/​CVPR4​2600.​2020.​00128
Zhou C, Dong Z, Lin H (2022) Learning persistent homology of 3d point clouds. Comput Graph 102:269–
279. https://​doi.​org/​10.​1016/j.​cag.​2021.​10.​022
Zhou K, Bhatnagar BL, Schiele B, et al (2022b) Adjoint rigid transform network: task-conditioned align-
ment of 3d shapes. In: 2022 international conference on 3D vision (3DV), pp 1–11, https://​doi.​org/​10.​
1109/​3DV57​658.​2022.​00019
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: 2018
IEEE/CVF conference on computer vision and pattern recognition, pp 4490–4499, https://​doi.​org/​10.​
1109/​CVPR.​2018.​00472
Zhu G, Zhou Y, Zhao J et al (2022) Point cloud recognition based on lightweight embeddable attention
module. Neurocomputing 472:138–148. https://​doi.​org/​10.​1016/j.​neucom.​2021.​10.​098
Zhu J, Li Y, Hu Y et al (2020) Rubik’s cube+: a self-supervised feature learning framework for 3d medical
image analysis. Med Image Anal 64(101):746. https://​doi.​org/​10.​1016/j.​media.​2020.​101746
Zhu M, Ghaffari M, Peng H (2022b) Correspondence-free point cloud registration with so(3)-equivariant
implicit shape representations. In: Proceedings of the 5th conference on robot learning (CoRL), Pro-
ceedings of machine learning research, vol 164. PMLR, pp 1412–1422

13
168 Page 52 of 52 J. Fei, Z. Deng

Zhu M, Han S, Cai H, et al (2023) 4d panoptic segmentation as invariant and equivariant field predic-
tion. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp
22,488–22,498
Zhuang X, Li Y, Hu Y et al (2019) Self-supervised feature learning for 3d medical images by playing
Rubik’s cube. In: Shen D, Liu T, Peters TM et al (eds) Medical image computing and computer
assisted intervention (MICCAI). Springer International Publishing, Cham, pp 420–428. https://​doi.​
org/​10.​1007/​978-3-​030-​32251-9_​46
Zitnick CL, Chanussot L, Das A, et al (2020) An introduction to electrocatalyst design using machine learn-
ing for renewable energy storage. https://​doi.​org/​10.​48550/​ARXIV.​2010.​09435

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy