This repo contains the official implementation of the solvers and estimators proposed in the paper "Relative Pose Estimation through Affine Corrections of Monocular Depth Priors". The solvers and estimators are implemented using C++, and we provide easy-to-use Python bindings.
Note: "MAD" is an acronym for "Monocular Affine Depth".
Overview: Our method takes a pair of images as input, runs off-the-shelf feature matching and monocular depth estimation, then jointly estimates the relative pose, scale and shift parameters of the two depth maps, and optionally the focal lengths.
We develop three solvers for relative pose estimation that explicitly account for independent affine (scale and shift) ambiguities, tailored for three setups of cameras: calibrated, shared-focal, and unknown focal lengths (two-focal). The solvers are further combined with classic point-based solvers and epipolar constraints in our hybrid RANSAC estimators. Our estimators show consistent improvements over classic point-based methods (e.g. PoseLib) and recent two-view reconstruction methods DUSt3R and MASt3R, across different datasets with different feature matchers and monocular depth estimation models.
Some highlight results (probably best viewed in light theme on GitHub!):
Matches | Method | MD Model | AUC@5° | AUC@10° | AUC@20° |
---|---|---|---|---|---|
SP+SG | PoseLib-6pt | - | 12.84 | 28.13 | 45.64 |
Ours-sf | DA-met. | 18.35 | 37.54 | 57.58 | |
RoMa | PoseLib-6pt | - | 27.17 | 49.24 | 67.42 |
Ours-sf | DA-met. | 29.81 | 53.11 | 71.15 | |
MASt3R | PoseLib-6pt | - | 30.28 | 54.16 | 72.87 |
Ours-sf | DA-met. | 31.87 | 56.20 | 74.51 | |
MASt3R | 32.58 | 56.99 | 74.91 | ||
Reference entry - DUSt3R | 25.90 | 48.45 | 68.03 | ||
Reference entry - MASt3R | 23.94 | 46.44 | 66.18 |
Matches | Method | MD Model | AUC@2° | AUC@5° | AUC@10° | AUC@20° |
---|---|---|---|---|---|---|
SP+LG | PoseLib-7pt | - | 5.85 | 13.95 | 21.94 | 30.71 |
Ours-tf | DAv2-met. | 9.15 | 22.22 | 32.80 | 43.26 | |
RoMa | PoseLib-7pt | - | 8.73 | 20.31 | 30.45 | 41.48 |
Ours-tf | DAv2-met. | 13.50 | 29.19 | 42.18 | 54.42 | |
MASt3R | PoseLib-7pt | - | 12.58 | 30.27 | 45.57 | 59.85 |
Ours-tf | DAv2-met. | 18.05 | 39.92 | 56.64 | 70.86 | |
MASt3R | 22.44 | 48.02 | 64.79 | 76.55 | ||
Reference entry - DUSt3R | 6.43 | 24.47 | 42.39 | 58.36 | ||
Reference entry - MASt3R | 13.39 | 38.41 | 57.92 | 71.91 |
We are working on setting up wheel for easy installation using PyPI. Currently please use the following method to install from source.
sudo apt-get install libeigen3-dev libceres-dev libopencv-dev
Note: The two-focal estimator currently relies on cv::recoverPose
from OpenCV, we plan to remove dependency on OpenCV in future updates.
git clone --recursive https://github.com/MarkYu98/madpose
pip install .
If you would like to see the building process (e.g. CMake logs) you can add -v
option to the above command.
Note: MADPose also depends on PoseLib. By default, CMake will automatically build PoseLib using FetchContent
. Set FETCH_POSELIB
CMake option to OFF
if you prefer to use a self-installed version.
python -c "import madpose"
You should not see any errors if MADPose is successfully installed.
We provide Python bindings of our 3 hybrid estimators for image pairs with calibrated cameras, shared-focal cameras, and cameras with unknown focal lengths (two-focal).
The estimators take HybridLORansacOptions
and EstimatorConfig
for related settings, some useful settings are:
import madpose
options = madpose.HybridLORansacOptions()
options.min_num_iterations = 1000
options.max_num_iterations = 10000
options.success_probability = 0.9999
options.random_seed = 0 # for reproducibility
options.final_least_squares = True
options.threshold_multiplier = 5.0
options.num_lo_steps = 4
# squared px thresholds for reprojection error and epipolar error
options.squared_inlier_thresholds = [reproj_pix_thres ** 2, epipolar_pix_thres ** 2]
# weight when scoring for the two types of errors
options.data_type_weights = [1.0, epipolar_weight]
est_config = madpose.EstimatorConfig()
# if enabled, the input min_depth values are guaranteed to be positive with the estimated depth offsets (shifts), default: True
est_config.min_depth_constraint = True
# if disabled, will model the depth with only scale (only applicable to the calibrated camera case)
est_config.use_shift = True
We provide a example image pairs and code snippets in examples/ to test the hybrid estimators. More demos and evaluations will be added in the future.
We use numpy
and cv2
to handle pre-computed data in Python in the examples, you can easily install them using pip:
pip install numpy opencv-python
pose, stats = madpose.HybridEstimatePoseScaleOffset(
mkpts0, mkpts1,
depth0, depth1,
[depth_map0.min(), depth_map1.min()],
K0, K1, options, est_config
)
# rotation and translation of the estimated pose
R_est, t_est = pose.R(), pose.t()
# scale and offsets of the affine corrected depth maps
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1
The parameters are: keypoint matches(mkpts0
, mkpts1
), their corresponding depth prior values(depth0
, depth1
), min depth values for both views (used when est_config.min_depth_constraint
is True
), camera intrinsics(K0
, K1
),options
, and est_config
.
See examples/calibrated.py for a complete code example, evaluation, and comparison with point-based estimation using PoseLib.
pose, stats = madpose.HybridEstimatePoseScaleOffsetSharedFocal(
mkpts0, mkpts1,
depth0, depth1,
[depth_map0.min(), depth_map1.min()],
pp0, pp1, options, est_config
)
# rotation and translation of the estimated pose
R_est, t_est = pose.R(), pose.t()
# scale and offsets of the affine corrected depth maps
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1
# estimated shared focal length
f_est = pose.focal
Different from the calibrated estimator, now instead of intrinsics(K0
, K1
), the shared-focal estimator now takes as input the principal points(pp0
, pp1
).
See examples/shared_focal.py for complete example.
pose, stats = madpose.HybridEstimatePoseScaleOffsetTwoFocal(
mkpts0, mkpts1,
depth0, depth1,
[depth_map0.min(), depth_map1.min()],
pp0, pp1, options, est_config
)
# rotation and translation of the estimated pose
R_est, t_est = pose.R(), pose.t()
# scale and offsets of the affine corrected depth maps
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1
# the estimated two focal lengths
f0_est, f1_est = pose.focal0, pose.focal1
The parameters are same with the shared-focal estimator, but now the estimator will estimate two independent focal lengths.
See examples/two_focal.py for complete example.
You can compare with point-based estimators from PoseLib. You need to install the Poselib's Python bindings.
The corresponding point-based estimation are included in each of the three example scripts above.
Our hybrid estimators combine our newly proposed depth-aware solvers with the point-based solvers. The Python implementation of the solvers can be found under solver_py/, which also include calls to the C++ solvers through Python bindings for comparison.
As described in the paper, the proposed solvers all work in a 2-step manner, first solve for the depth scale and shifts (affine parameters) of the depth maps and then recover the pose by finding the rigid transformation between the back-projected points. We provide Python bindings for the first step (solving scale and shifts) as well as combined wrapper for both the pose and the affine parameters.
The bindings for solving the affine parameters:
# Calibrated case
affine_params = madpose.solve_scale_and_shift(x1.T, x2.T, d1, d2)
# Shared-focal
affine_params = madpose.solve_scale_and_shift_shared_focal(x1.T, x2.T, d1, d2)
# Two-focal
affine_params = madpose.solve_scale_and_shift_two_focal(x1.T, x2.T, d1, d2)
The input x1
and x2
are np.array
(or list) of homogeneous normalized image coordinates ([x, y, 1]
) of the minimal set of d1
and d2
are np.array
(or list) of the corresponding depth values. The returned affine_params
is a list of candidate solutions (a1, a2, b1, b2)
with a1
always equal to 1 and therefore a2, b1, b2
correspond to the scale and shift values
Or we can use the combined wrapper that solves affine parameters and also recover the candidate poses:
# Calibrated case
poses = madpose.solve_scale_shift_pose(x1.T, x2.T, d1, d2)
# Shared-focal
poses = madpose.solve_scale_shift_pose_shared_focal(x1.T, x2.T, d1, d2)
# Two-focal
poses = madpose.solve_scale_shift_pose_two_focal(x1.T, x2.T, d1, d2)
Note that for simplicity the return solutions are all named poses
, the list contains objects of different classes for the three cases (madpose.PoseScaleOffset
, madpose.PoseScaleOffsetSharedFocal
, and madpose.PoseScaleOffsetTwoFocal
). Retrieve the pose, focal lengths, and affine parameters by:
R_est, t_est = pose.R(), pose.t()
s_est, o0_est, o1_est = pose.scale, pose.offset0, pose.offset1
# for PoseScaleOffsetSharedFocal
f_est = pose.focal
# for PoseScaleOffsetTwoFocal
f0_est, f1_est = pose.focal0, pose.focal1
- Remove dependency on OpenCV.
- Setup wheel for PyPI
- Add experiment scirpts on datasets
This codebase is inspired by and built upon many research work and opensource projects, we thank the authors and contributors for their work.
If you find our work useful in your research, please consider citing our paper:
@misc{yu2025madpose,
title={Relative Pose Estimation through Affine Corrections of Monocular Depth Priors},
author={Yifan Yu and Shaohui Liu and Rémi Pautrat and Marc Pollefeys and Viktor Larsson},
year={2025},
eprint={2501.05446},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.05446},
}