📌 This is the official PyTorch implementation of the work:
VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection
Wuyang Li 1 , Zhu Yu 2 , Alexandre Alahi 1
1 École Polytechnique Fédérale de Lausanne (EPFL); 2 Zhejiang University
Code is coming soon! We’re currently cleaning up the code and unifying the camera- and LiDAR-based implementations into a single project, which serves as a powerful, clean, and extensible baseline model for the community. If you can’t wait for the official release, feel free to contact me for the individual implementations.
Contact: wuyang.li@epfl.ch
VoxDet address semantic occupancy prediction with an instance-centric formulation inspried by dense object detection, which uses a Voxel-to-Instance (VoxNT) trick freely transferring voxel-level class labels to instance-level offset labels.
- Versatile: Adaptable to various voxel-based scenarios, such as camera and LiDAR settings.
- Powerful: Achieve joint state-of-the-art on both camera-based and LiDAR-based SSC benchmarks.
- Efficient: Fast (~1.3× speed-up) and lightweight (reducing ~57.9% parameters).
- Leaderboard Topper: Achieve 63.0 IoU (single-frame model), securing 1st place on the SemanticKITTI leaderboard.
Note that VoxDet is single-frame single-model method without extra data and labels.
VoxDet (blue curve) is significantly more efficient and effective than the previous state-of-the-art method, CGFormer (gray color).
Greatly appreciate the tremendous effort for the following projects!
- FCOS: Fully Convolutional One-Stage Object Detection
- Context and Geometry Aware Voxel Transformer for Semantic Scene Completion
- SIGMA: Semantic-complete Graph Matching For Domain Adaptive Object Detection
- Revisiting the Sibling Head in Object Detector
- VoxFormer: a Cutting-edge Baseline for 3D Semantic Occupancy Prediction
- Release the arXiv paper
- Release the unified codebase, including both camera-based and LiDAR-based implementation
- Release all models
If you think our work is helpful for your project, I would greatly appreciate it if you could consdier citing our work
@article{li2025voxdet,
title={VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection},
author={Li, Wuyang and Yu, Zhu and Alahi, Alexandre},
journal={arXiv preprint arXiv:2506.04623},
year={2025}
}