The intrinsic and extrinsic properties of a camera can be estimated by comparing how a known, 3D structure gets projected into the image space. Assuming an ideal pinhole model, we can use a method called the Direct Linear Transform (DLT) coupled with Zhang's Method to estimate the parameters of the camera calibration/instrinsics matrix given 3D world-space points and their corresponding projected locations in the camera image space:
The direct linear transform helps us solve for $P$; given we know the structure of $\mathbf{X}$ which is given by a checkerboard pattern.
import Pkg
Pkg.add(["Plots", "Statistics", "LinearAlgebra", "JLD"])
using Plots, Statistics, LinearAlgebra, JLD
Pkg.add(["SymEngine", "Latexify"])
using SymEngine, Latexify
Julia does not yet have the ability to detect a checkerboard pattern, so we'll load precomputed points from OpenCV for this example:
checkerboard_data = readlines("calibration/checkerboard.txt")
const num_images = parse(Int, checkerboard_data[1])
const board_dims = map(d -> parse(Int, d), split(checkerboard_data[2], " "))
const board_size = board_dims[1]*board_dims[2]
const grid_resolution = parse(Float64, checkerboard_data[3])
const img_width = 640
const img_height = 480
normalization_matrix = [
2/img_width 0 -1
0 2/img_height -1
0 0 1
]
measured_points = Array{Float64}(undef, num_images, board_size, 3)
img = 1
for i=4:board_size+1:size(checkerboard_data, 1)
println("Loading observed (measured) points from $(checkerboard_data[i])")
for j=1:board_size
measured_points[img, j, :] = normalization_matrix*[map(p -> parse(Float64, p), split(checkerboard_data[i+j], " ")); 1]
end
img += 1
end
Next, we'll setup the 3D world structure of the checkerboard: All the points lie on the $Z=0$ plane and the world origin is the center of the checkerboard. Notice that we omit the $Z$ coordinates from the board structure because all their entries will always be 0. This simplifies our problem later on.
board_structure = Array{Float64}(undef, board_dims[1]*board_dims[2], 3)
xoffset = 0.5*(board_dims[2]-1)*grid_resolution
yoffset = 0.5*(board_dims[1]-1)*grid_resolution
for c=1:board_dims[2]
for r=1:board_dims[1]
board_structure[(c-1)*board_dims[1]+r, :] = [(c-1)*grid_resolution-xoffset; -(r-1)*grid_resolution+yoffset; 1]
end
end
scatter(board_structure[:, 1], board_structure[:, 2], legend=false, title="3D Checkerboard Structure", aspectratio=:equal)
To recap, our projection equation looks like this:
We realize that the $Z$ coordinate is always zero. This means we can eliminate the third column of $r$:
So grouping the $K$ and $r$ matrices together, we form $H$:
Where each point generates the equation:
@vars X Y x y
H = [symbols("H_$i$j") for i in 1:3, j in 1:3]
eqns = H*[X; Y; 1]
latexify(map(expand, eqns))
Where $x$ equals:
latexify(eqns[1]/eqns[3])
Multiplying both side of the equation $x=\frac{H_{13} + X \cdot H_{11} + Y \cdot H_{12}}{H_{33} + X \cdot H_{31} + Y \cdot H_{32}}$ by the denominator $x(H_{33} + X \cdot H_{31} + Y \cdot H_{32})=H_{13} + X \cdot H_{11} + Y \cdot H_{12}$ and moving the right side to the left, so the expression equals zero:
latexify(expand(x*eqns[3]-eqns[1]))
Applying similar logic to $y$:
latexify(expand(y*eqns[3]-eqns[2]))
We notice a pattern where we can group the terms of the equations:
I like to lay it out in a table:
$H_{11}$ | $H_{21}$ | $H_{31}$ | $H_{12}$ | $H_{22}$ | $H_{32}$ | $H_{13}$ | $H_{23}$ | $H_{33}$ |
---|---|---|---|---|---|---|---|---|
$-X$ | $0$ | $Xx$ | $-Y$ | $0$ | $Yx$ | $-1$ | $0$ | $x$ |
$0$ | $-X$ | $Xy$ | $0$ | $-Y$ | $Yy$ | $0$ | $-1$ | $y$ |
So we can encode the system of equations as a vector $\mathbf{a}_x=\begin{pmatrix}-X & 0 & Xx & -Y & 0 & Yx & -1 & 0 & x\end{pmatrix}$ times the vectorized form of $\mathbf{h}=reshape(H, 9, 1)$ where $\mathbf{a}_x\mathbf{h}=0$:
latexify([-X 0 X*x -Y 0 Y*x -1 0 x]*reshape(H, 9, 1))
Same goes for $y$:
latexify([0 -X X*y 0 -Y Y*y 0 -1 y]*reshape(H, 9, 1))
So then what we do is stack the vectors into one large $2N\times9$ matrix $A$ where $N$ is the number of checkerboard points:
And solve the system for $\mathbf{h}$ by finding the singular vector of $A$ corresponding to the smallest singular value:
function solve_H(board_structure, measured_points, board_size)
A = Array{Float64}(undef, 2*board_size, 9)
for i=1:board_size
bstruct = board_structure[i, :]
proj_pt = measured_points[i, :]
A[2*(i-1)+1, :] = [-bstruct[1] 0 bstruct[1]*proj_pt[1] -bstruct[2] 0 bstruct[2]*proj_pt[1] -1 0 proj_pt[1]]
A[2*(i-1)+2, :] = [0 -bstruct[1] bstruct[1]*proj_pt[2] 0 -bstruct[2] bstruct[2]*proj_pt[2] 0 -1 proj_pt[2]]
end
U, Σ, V = svd(A)
return reshape(V[:, 9], 3, 3)
end
P = solve_H(board_structure, measured_points[1, :, :], board_size)
And we can check how well our projection matrix works by measuring the reprojection error which ideally should show a group of points clustered close to zero:
reprojection_error = Array{Float64}(undef, board_dims[1]*board_dims[2], 2)
for i=1:board_dims[1]*board_dims[2]
reprojected_pt = P*board_structure[i, :]
reprojected_pt /= reprojected_pt[3]
reprojection_error[i, :] = reprojected_pt[1:2] - measured_points[1, i, 1:2]
end
scatter(reprojection_error[:, 1], reprojection_error[:, 2], legend=false, title="Reprojection Error from Estimated P")
Returning to the original problem, now that we know $H$ we want to extract the calibration matrix $K$ from H: $H=K\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t}\end{bmatrix}$ How do we obtain the matrix decomposition?
We know constraints about the form of $K$, $\mathbf{r}_1$, and $\mathbf{r}_2$ that we can exploit for our custom decomposition:
Exploit the fact that $K$ is invertible and left-multiply on both sides of the equation: $K^{-1}\begin{bmatrix}\mathbf{h}_1 & \mathbf{h}_2 & \mathbf{h}_3\end{bmatrix}=\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t}\end{bmatrix}$ which leads to two equations (multiplying out the $K$s and $r$s):
$\mathbf{r}_1$, $\mathbf{r}_2$, $\mathbf{r}_3$ form an orthonormal basis as they come from a rotation matrix ($\mathbf{r}_3$ is not relevant) so $\mathbf{r}_1\cdot\mathbf{r}_2=0$ and $\|\mathbf{r}_1\|=\|\mathbf{r}_2\|=1$. Substituting,
Now we can substitute our previously-defined $B$ matrix into the expressions:
Exploit the system of equations
@vars b_11 b_12 b_13 b_22 b_23 b_33
B = [
b_11 b_12 b_13
b_12 b_22 b_23
b_13 b_23 b_33
]
h_1 = [symbols("h_$(i)1") for i in 1:3]
h_2 = [symbols("h_$(i)2") for i in 1:3]
latexify(map(expand, transpose(h_1)*B*h_2))
latexify(map(expand, transpose(h_1)*B*h_1-transpose(h_2)*B*h_2))
$b_{11}$ | $b_{12}$ | $b_{13}$ | $b_{22}$ | $b_{23}$ | $b_{33}$ |
---|---|---|---|---|---|
$h_{12}h_{11}$ | $h_{21}h_{12}+h_{22}h_{11}$ | $h_{11}h_{32}+h_{12}h_{31}$ | $h_{21}h_{22}$ | $h_{21}h_{32}+h_{22}h_{31}$ | $h_{31}h_{32}$ |
$h_{11}^2-h_{12}^2$ | $2(h_{21}h_{11}-h_{22}h_{12})$ | $2(h_{11}h_{31}-h_{12}h_{32})$ | $h_{21}^2-h_{22}^2$ | $2(h_{21}h_{31}-h_{22}h_{32})$ | $h_{31}^2-h_{32}^2$ |
Since one $H$ is estimated per image, each calibration image yields one pair of these equations. $K$ is an upper triangular matrix, which give $B$ 6 degrees of freedom so we'll need a minimum of $\frac{6}{2}=3$ different camera views of the checkerboard to estimate $K$. We apply the same algorithm here; we stack the vectors as laid out in the table and form a matrix $\bigvee$ with dimensions $2N\times6$:
Since $B$ is symmetric, we parameterize its upper triangular into a vector $\mathbf{b}=\begin{pmatrix}b_{11} & b_{12} & b_{13} & b_{22} & b_{23} & b_{33}\end{pmatrix}$ and finally setup our complete system:
⋁ = Array{Float64}(undef, 2*num_images, 6)
H_cache = Array{Float64}(undef, num_images, 3, 3)
for i=1:num_images
# calculate H given the image
h = solve_H(board_structure, measured_points[i, :, :], board_size)
H_cache[i, :, :] = h
# setup the 2 equations specifying the orthogonal and normalized constraints for b and H
v_ortho = [
h[1,2]*h[1,1]
h[2,1]*h[1,2]+h[2,2]*h[1,1]
h[1,1]*h[3,2]+h[1,2]*h[3,1]
h[2,1]*h[2,2]
h[2,1]*h[3,2]+h[2,2]*h[3,1]
h[3,1]*h[3,2]
]
v_norm = [
h[1,1]^2-h[1,2]^2
2*(h[2,1]*h[1,1]-h[2,2]*h[1,2])
2*(h[1,1]*h[3,1]-h[1,2]*h[3,2])
h[2,1]^2-h[2,2]^2
2*(h[2,1]*h[3,1]-h[2,2]*h[3,2])
h[3,1]^2-h[3,2]^2
]
⋁[2*i-1, :] = v_ortho
⋁[2*i, :] = v_norm
end
⋁
Sometimes, the solution $b$ is not always positive definite because the eigenvalues may be negative. We can fix this by flipping the sign of the eigenvalues/vectors to make $B$ PSD:
U, Σ, V = svd(⋁)
println("Singular values: $(Σ)")
b = V[:, 6]
println("Smallest singular vector: $(b)")
B = [
b[1] b[2] b[3];
b[2] b[4] b[5];
b[3] b[5] b[6]
]
if !isposdef(Symmetric(B))
println("Not positive definite!")
B = -B
end
println("Now is positive definite? $(isposdef(Symmetric(B)))")
Finally, apply the Cholesky decomposition to extract the upper triangular $K$: $B=K^{-\top}K^{-1}$
C = cholesky(Symmetric(B))
C.U'*C.U
K⁻¹ = C.U
K = inv(K⁻¹)
K = inv(normalization_matrix)*K/K[3,3]
Now that we have the the camera instrinsic calibration, we can start estimating the motion of the camera (i.e. extrinsics). Revisting our old friend:
Due to noise in the data, $\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t}\end{bmatrix}$ likely approximates a true rotation matrix. We'll use SVD to force its columns to be orthonormal.
camera_extrinsics = Array{Float64}(undef, num_images, 3, 4)
for i=1:num_images
h₁ = H_cache[i, :, 1]
h₂ = H_cache[i, :, 2]
h₃ = H_cache[i, :, 3]
λ = 1/norm(K⁻¹*h₃)
r₁ = λ*K⁻¹*h₁
r₂ = λ*K⁻¹*h₂
r₃ = r₁×r₂
t = λ*K⁻¹*h₃
R = [r₁ r₂ r₃]
U, _, V = svd(R)
R = U*V'
camera_extrinsics[i, :, :] = [R t]
end
scatter(camera_extrinsics[:, 1, 4], camera_extrinsics[:, 2, 4], camera_extrinsics[:, 3, 4], camera=(45, 45), legend=false, title="Calibrated Positions of Cameras")
Huge thanks to Professor Cyrill Stachniss' lectures on photogrammetry and computer vision, which this report was based on.
Here we established a method to estimate the parameters of a camera's intrinsic parameters. However, this method is not statistically optimal (assuming Gaussian noise). The next step is to use iterative nonlinear least-squares algorithms to refine the camera parameters, which also opens up the opportunity to estimate nonlinear distortion parameters too. Those topics will be covered in another report but in the meantime, this direct algebraic method serves as a good initialization point upon which further iterative refinement can be performed.
save("calibration/initial_parameters.jld",
"camera_extrinsics", camera_extrinsics,
"K", K,
"board_structure", board_structure)
An alternative to using Cholesky decomposition for factoring $B$ into $K$ is through a direct formula listed in Zhang's report (I suspect this is just the analytical form for Cholesky decomposition):
v₀ = (B[1,2]*B[1,3]-B[1,1]B[2,3])/(B[1,1]B[2,2]-B[1,2]^2)
λ = B[3,3]-(B[1,3]^2+v₀*(B[1,2]*B[1,3]-B[1,1]*B[2,3]))/B[1,1]
α = sqrt(λ/B[1,1])
β = sqrt(λ*B[1,1]/(B[1,1]*B[2,2]-B[1,2]^2))
γ = -B[1,2]*α^2*β/λ
u₀ = γ*v₀/β-B[1,3]*α^2/λ
inv(normalization_matrix)*[
α γ u₀;
0 β v₀;
0 0 1
]