0% found this document useful (0 votes)
126 views68 pages

Cvpr17 Pointnet Slides

PointNet is a deep learning model that can directly take unordered point clouds as input and output classifications or segmentations. It achieves permutation invariance through symmetric functions and learns an embedding that aligns different shapes. PointNet achieves state-of-the-art results on 3D classification and segmentation benchmarks while being robust to missing data. It provides a unified framework for various 3D tasks like classification, segmentation, and scene parsing.

Uploaded by

Dr. Chekir Amira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views68 pages

Cvpr17 Pointnet Slides

PointNet is a deep learning model that can directly take unordered point clouds as input and output classifications or segmentations. It achieves permutation invariance through symmetric functions and learns an embedding that aligns different shapes. PointNet achieves state-of-the-art results on 3D classification and segmentation benchmarks while being robust to missing data. It provides a unified framework for various 3D tasks like classification, segmentation, and scene parsing.

Uploaded by

Dr. Chekir Amira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

PointNet: Deep Learning on Point Sets for

3D Classification and Segmentation

Charles R. Qi*
Hao Su*
Kaichun Mo
Leonidas J. Guibas
Big Data + Deep Representation Learning

Robot Perception Augmented Reality Shape Design

source: Scott J Grunewald source: Google Tango source: solidsolutions

Emerging 3D Applications
Big Data + Deep Representation Learning

Robot Perception Augmented Reality Shape Design

source: Scott J Grunewald source: Google Tango source: solidsolutions

Need for 3D Deep Learning!


3D Representations

Point Cloud Mesh Volumetric Projected View


RGB(D)
3D Representation: Point Cloud
Point cloud is close to raw sensor data

LiDAR

Depth Sensor
Point Cloud
3D Representation: Point Cloud
Point cloud is close to raw sensor data

Point cloud is canonical


Mesh

LiDAR
Volumetric

Depth Sensor
Point Cloud
Depth Map
Previous Works
Most existing point cloud features are handcrafted
towards specific tasks

Source: https://github.com/PointCloudLibrary/pcl/wiki/Overview-and-Comparison-of-Features
Previous Works

Point cloud is converted to other representations


before it’s fed to a deep neural network

Conversion Deep Net

Voxelization 3D CNN

Projection/Rendering 2D CNN

Feature extraction Fully Connected


Research Question:

Can we achieve effective feature learning


directly on point clouds?
Our Work: PointNet

End-to-end learning for scattered, unordered point data

PointNet
Our Work: PointNet

End-to-end learning for scattered, unordered point data

Unified framework for various tasks

Object Classification

PointNet Object Part Segmentation


Semantic Scene Parsing
...
Our Work: PointNet

End-to-end learning for scattered, unordered point data

Unified framework for various tasks


Challenges

Unordered point set as input


Model needs to be invariant to N! permutations.

Invariance under geometric transformations


Point cloud rotations should not alter classification results.
Challenges

Unordered point set as input


Model needs to be invariant to N! permutations.

Invariance under geometric transformations


Point cloud rotations should not alter classification results.
Unordered Input

Point cloud: N orderless points, each represented by a D


dim vector

N
Unordered Input

Point cloud: N orderless points, each represented by a D


dim vector

D D

N represents the same set as N


Unordered Input

Point cloud: N orderless points, each represented by a D


dim vector

D D

N represents the same set as N

Model needs to be invariant to N! permutations


Permutation Invariance: Symmetric Function

f (x1 , x2 ,…, xn ) ≡ f (xπ1 , xπ 2 ,…, xπ n ) , xi ∈! D


Permutation Invariance: Symmetric Function

f (x1 , x2 ,…, xn ) ≡ f (xπ1 , xπ 2 ,…, xπ n ) , xi ∈! D

Examples:
f (x1 , x2 ,…, xn ) = max{x1 , x2 ,…, xn }
f (x1 , x2 ,…, xn ) = x1 + x2 +…+ xn

Permutation Invariance: Symmetric Function

f (x1 , x2 ,…, xn ) ≡ f (xπ1 , xπ 2 ,…, xπ n ) , xi ∈! D

Examples:
f (x1 , x2 ,…, xn ) = max{x1 , x2 ,…, xn }
f (x1 , x2 ,…, xn ) = x1 + x2 +…+ xn

How can we construct a family of symmetric
functions by neural networks?
Permutation Invariance: Symmetric Function
Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
Permutation Invariance: Symmetric Function
Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h
(1,2,3)
(1,1,1)

(2,3,2)

(2,3,4)
Permutation Invariance: Symmetric Function
Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h
(1,2,3) simple symmetric function

(1,1,1) g

(2,3,2)

(2,3,4)
Permutation Invariance: Symmetric Function
Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h
(1,2,3) simple symmetric function

(1,1,1) g γ

(2,3,2)

(2,3,4) PointNet (vanilla)


Permutation Invariance: Symmetric Function

What symmetric functions can be constructed by PointNet?

Symmetric functions

PointNet
(vanilla)
Universal Set Function Approximator

Theorem:
A Hausdorff continuous symmetric function f : 2X → ! can be
arbitrarily approximated by PointNet.

S ⊆ !d PointNet (vanilla)
Basic PointNet Architecture

Empirically, we use multi-layer perceptron (MLP) and max pooling:

h
(1,2,3) MLP

g γ
(1,1,1) MLP

max MLP
(2,3,2) MLP

(2,3,4) MLP PointNet (vanilla)


Challenges

Unordered point set as input


Model needs to be invariant to N! permutations.

Invariance under geometric transformations


Point cloud rotations should not alter classification results.
Input Alignment by Transformer Network

Idea: Data dependent transformation for automatic alignment

3 T-Net transform 3
params

N Transform N

Transformed
Data
Data
Input Alignment by Transformer Network

Idea: Data dependent transformation for automatic alignment

3 T-Net transform 3
params

N Transform N

Transformed
Data
Data
Input Alignment by Transformer Network

Idea: Data dependent transformation for automatic alignment

3 T-Net transform 3
params

N Transform N

Transformed
Data
Data
Input Alignment by Transformer Network

The transformation is just matrix multiplication!

3 T-Net transform 3
params: 3x3

Matrix
N
Mult.

Transformed
Data
Data
Embedding Space Alignment

transform
T-Net params: 64x64

Matrix
Mult.

Input Transformed
embeddings: embeddings:
Nx64 Nx64
Embedding Space Alignment

transform
T-Net params: 64x64 Regularization:

Matrix Transform matrix A 64x64


Mult. close to orthogonal:

Input Transformed
embeddings: embeddings:
Nx64 Nx64
PointNet Classification Network
PointNet Classification Network
PointNet Classification Network
PointNet Classification Network
PointNet Classification Network
PointNet Classification Network
PointNet Classification Network
Extension to PointNet Segmentation Network

local embedding global feature


Extension to PointNet Segmentation Network

local embedding global feature


Results
Results on Object Classification

3D CNNs

dataset: ModelNet40; metric: 40-class classification accuracy (%)


Results on Object Part Segmentation
Results on Object Part Segmentation

dataset: ShapeNetPart; metric: mean IoU (%)


Results on Semantic Scene Parsing

Input
Output

dataset: Stanford 2D-3D-S (Matterport scans)


Robustness to Data Corruption

dataset: ModelNet40; metric: 40-class classification accuracy (%)


Robustness to Data Corruption

Less than 2% accuracy drop with 50% missing data

dataset: ModelNet40; metric: 40-class classification accuracy (%)


Robustness to Data Corruption

dataset: ModelNet40; metric: 40-class classification accuracy (%)


Robustness to Data Corruption

Why is PointNet so robust


to missing data?

3D CNN
Visualizing Global Point Cloud Features

3 MLP 1024
maxpool

n
shared global feature

Which input points are contributing to the global feature?


(critical points)
Visualizing Global Point Cloud Features

Original Shape:

Critical Point Sets:


Visualizing Global Point Cloud Features

3 MLP 1024
maxpool

n
shared global feature

Which points won’t affect the global feature?


Visualizing Global Point Cloud Features

Original Shape:

Critical Point Set:

Upper bound set:


Visualizing Global Point Cloud Features (OOS)

Original Shape:

Critical Point Set:

Upper bound Set:


Conclusion
• PointNet is a novel deep neural network that directly
consumes point cloud.
• A unified approach to various 3D recognition tasks.
• Rich theoretical analysis and experimental results.

Code & Data Available!


http://stanford.edu/~rqi/pointnet

See you at Poster 9!


Thank you!
THE END
Speed and Model Size

Inference time 11.6ms, 25.3ms GTX1080, batch size 8


Permutation Invariance: How about Sorting?

“Sort” the points before feeding them into a network.


Unfortunately, there is no canonical order in high dim space.

lexsorted
(1,2,3) (1,1,1)
(1,1,1) (1,2,3)
(2,3,2) (2,3,2) MLP
(2,3,4) (2,3,4)
Permutation Invariance: How about Sorting?

“Sort” the points before feeding them into a network.


Unfortunately, there is no canonical order in high dim space.

Multi-Layer Perceptron
lexsorted (ModelNet shape classification)
(1,2,3) (1,1,1) Accuracy
(1,1,1) (1,2,3)
(2,3,2) (2,3,2) MLP Unordered Input 12%
(2,3,4) (2,3,4) Lexsorted Input 40%
PointNet (vanilla) 87%
Permutation Invariance: How about RNNs?

Train RNN with permutation augmentation.


However, RNN forgets and order matters.

LSTM LSTM LSTM … LSTM

MLP MLP MLP … MLP

(1,2,3) (1,1,1) (2,3,2) (2,3,4)


Permutation Invariance: How about RNNs?

Train RNN with permutation augmentation.


However, RNN forgets and order matters.

LSTM Network
… (ModelNet shape classification)
LSTM LSTM LSTM LSTM

Accuracy
MLP MLP MLP … MLP LSTM 75%

PointNet (vanilla) 87%


(1,2,3) (1,1,1) (2,3,2) (2,3,4)
PointNet Classification Network

ModelNet40 Accuracy
PointNet (vanilla) 87.1%
+ input 3x3 87.9%
+ feature 64x64 86.9%
+ feature 64x64 + reg 87.4%
+ both 89.2%
Visualizing Point Functions

1x3 1x1024
Compact View: FCs

1x3 1x1024
Expanded View: FC FC FC FC FC
64 64 64 128

Which input point will activate neuron X?


Find the top-K points in a dense volumetric grid that activates neuron X.
Visualizing Point Functions

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy