The pose computation problem @cite Marchand16 consists in solving for the rotation and translation that minimizes the reprojection error from 3D-2D point correspondences.
The solvePnP
and related functions estimate the object pose given a set of object points, their corresponding image projections, as well as the camera intrinsic matrix and the distortion coefficients, see the figure below (more precisely, the X-axis of the camera frame is pointing to the right, the Y-axis downward and the Z-axis forward).
Points expressed in the world frame \f$ \bf{X}_w \f$ are projected into the image plane \f$ \left[ u, v \right] \f$ using the perspective projection model \f$ \Pi \f$ and the camera intrinsic parameters matrix \f$ \bf{A} \f$ (also denoted \f$ \bf{K} \f$ in the literature):
\f[ \begin{align} \begin{bmatrix} u \ v \ 1 \end{bmatrix} &= \bf{A} \hspace{0.1em} \Pi \hspace{0.2em} ^{c}\bf{T}w \begin{bmatrix} X{w} \ Y{w} \ Z{w} \ 1 \end{bmatrix} \ \begin{bmatrix} u \ v \ 1 \end{bmatrix} &= \begin{bmatrix} f_x & 0 & c_x \ 0 & f_y & cy \ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \ 0 & 1 & 0 & 0 \ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} r{11} & r{12} & r{13} & tx \ r{21} & r{22} & r{23} & ty \ r{31} & r{32} & r{33} & tz \ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X{w} \ Y{w} \ Z{w} \ 1 \end{bmatrix} \end{align} \f]
The estimated pose is thus the rotation (rvec
) and the translation (tvec
) vectors that allow transforming
a 3D point expressed in the world frame into the camera frame:
\f[ \begin{align} \begin{bmatrix} X_c \ Y_c \ Z_c \ 1 \end{bmatrix} &= \hspace{0.2em} ^{c}\bf{T}w \begin{bmatrix} X{w} \ Y{w} \ Z{w} \ 1 \end{bmatrix} \ \begin{bmatrix} X_c \ Y_c \ Zc \ 1 \end{bmatrix} &= \begin{bmatrix} r{11} & r{12} & r{13} & tx \ r{21} & r{22} & r{23} & ty \ r{31} & r{32} & r{33} & tz \ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X{w} \ Y{w} \ Z{w} \ 1 \end{bmatrix} \end{align} \f]
@anchor calib3d_solvePnP_flags
Refer to the cv::SolvePnPMethod enum documentation for the list of possible values. Some details about each method are described below:
The cv::solveP3P() computes an object pose from exactly 3 3D-2D point correspondences. A P3P problem has up to 4 solutions.
@note The solutions are sorted by reprojection errors (lowest to highest).
The cv::solvePnP() returns the rotation and the translation vectors that transform a 3D point expressed in the object coordinate frame to the camera coordinate frame, using different methods:
The cv::solvePnPGeneric() allows retrieving all the possible solutions.
Currently, only cv::SOLVEPNP_P3P, cv::SOLVEPNP_AP3P, cv::SOLVEPNP_IPPE, cv::SOLVEPNP_IPPE_SQUARE, cv::SOLVEPNP_SQPNP can return multiple solutions.
The cv::solvePnPRansac() computes the object pose wrt. the camera frame using a RANSAC scheme to deal with outliers.
More information can be found in @cite Zuliani2014RANSACFD
Pose refinement consists in estimating the rotation and translation that minimizes the reprojection error using a non-linear minimization method and starting from an initial estimate of the solution. OpenCV proposes cv::solvePnPRefineLM() and cv::solvePnPRefineVVS() for this problem.
cv::solvePnPRefineLM() uses a non-linear Levenberg-Marquardt minimization scheme @cite Madsen04 @cite Eade13 and the current implementation computes the rotation update as a perturbation and not on SO(3).
cv::solvePnPRefineVVS() uses a Gauss-Newton non-linear minimization scheme @cite Marchand16 and with an update of the rotation part computed using the exponential map.
@note at least three 3D-2D point correspondences are necessary.