Camera Calibration

The intrinsic and extrinsic properties of a camera can be estimated by comparing how a known, 3D structure gets projected into the image space. Assuming an ideal pinhole model, we can use a method called the Direct Linear Transform (DLT) coupled with Zhang's Method to estimate the parameters of the camera calibration/instrinsics matrix given 3D world-space points and their corresponding projected locations in the camera image space:

$\mathbf{x}=KR\begin{bmatrix}I_{3\times3}\vert\mathbf{-X_0}\end{bmatrix}\mathbf{X}$
where $\mathbf{x}$ is the projected image point in homogeneous coordinates while $\mathbf{X}$ is the corresponding 3D world point in homogeneous coordinates. $K$ is the intrinsic camera calibration matrix while $\begin{bmatrix}I_{3\times3}\vert\mathbf{-X_0}\end{bmatrix}$ is the extrinsic matrix the encodes the camera's rotation and translation in 3D world coordinate frame. Given the 3D points and its corresponding projections onto the image plane, we formulate the calibration problem as estimating the parameters of the projection matrix $P$:
$\mathbf{x}=P\mathbf{X}$
$\mathbf{x}=\begin{bmatrix}P_{11} & P_{12} & P_{13} \\ P_{21} & P_{22} & P_{23} \\ P_{31} & P_{32} & P_{33}\end{bmatrix}\mathbf{X}$

The direct linear transform helps us solve for $P$; given we know the structure of $\mathbf{X}$ which is given by a checkerboard pattern.

In [1]:
import Pkg
Pkg.add(["Plots", "Statistics", "LinearAlgebra", "JLD"])
using Plots, Statistics, LinearAlgebra, JLD

Pkg.add(["SymEngine", "Latexify"])
using SymEngine, Latexify
   Updating registry at `~/.julia/registries/General`
######################################################################### 100.0%
  Resolving package versions...
No Changes to `~/.julia/environments/v1.5/Project.toml`
No Changes to `~/.julia/environments/v1.5/Manifest.toml`
  Resolving package versions...
No Changes to `~/.julia/environments/v1.5/Project.toml`
No Changes to `~/.julia/environments/v1.5/Manifest.toml`

Julia does not yet have the ability to detect a checkerboard pattern, so we'll load precomputed points from OpenCV for this example:

In [2]:
checkerboard_data = readlines("calibration/checkerboard.txt")
Out[2]:
498-element Array{String,1}:
 "9"
 "9 6"
 "0.022"
 "calib_01.jpeg"
 "255.31075 370.9449"
 "254.64536 336.94894"
 "253.92151 303.21326"
 "253.12537 270.03577"
 "252.29417 237.10402"
 "251.40678 204.32014"
 "250.37651 171.64476"
 "249.37842 138.82846"
 "248.37593 106.11092"
 ⋮
 "321.94498 173.08527"
 "303.33307 151.2742"
 "284.0356 128.64449"
 "445.5115 274.6236"
 "429.1655 255.55904"
 "412.40997 236.09181"
 "395.40164 216.30057"
 "378.17566 195.90892"
 "360.55884 175.2183"
 "342.5738 153.73903"
 "324.26495 131.67159"
 "305.386 109.04985"
In [3]:
const num_images = parse(Int, checkerboard_data[1])
const board_dims = map(d -> parse(Int, d), split(checkerboard_data[2], " "))
const board_size = board_dims[1]*board_dims[2]
const grid_resolution = parse(Float64, checkerboard_data[3])

const img_width = 640
const img_height = 480
normalization_matrix = [
    2/img_width 0 -1
    0 2/img_height -1
    0 0 1
]

measured_points = Array{Float64}(undef, num_images, board_size, 3)
img = 1
for i=4:board_size+1:size(checkerboard_data, 1)
    println("Loading observed (measured) points from $(checkerboard_data[i])")
    for j=1:board_size
        measured_points[img, j, :] = normalization_matrix*[map(p -> parse(Float64, p), split(checkerboard_data[i+j], " ")); 1]
    end
    img += 1
end
Loading observed (measured) points from calib_01.jpeg
Loading observed (measured) points from calib_02.jpeg
Loading observed (measured) points from calib_03.jpeg
Loading observed (measured) points from calib_04.jpeg
Loading observed (measured) points from calib_05.jpeg
Loading observed (measured) points from calib_06.jpeg
Loading observed (measured) points from calib_07.jpeg
Loading observed (measured) points from calib_08.jpeg
Loading observed (measured) points from calib_09.jpeg

Next, we'll setup the 3D world structure of the checkerboard: All the points lie on the $Z=0$ plane and the world origin is the center of the checkerboard. Notice that we omit the $Z$ coordinates from the board structure because all their entries will always be 0. This simplifies our problem later on.

In [4]:
board_structure = Array{Float64}(undef, board_dims[1]*board_dims[2], 3)
xoffset = 0.5*(board_dims[2]-1)*grid_resolution
yoffset = 0.5*(board_dims[1]-1)*grid_resolution
for c=1:board_dims[2]
    for r=1:board_dims[1]
        board_structure[(c-1)*board_dims[1]+r, :] = [(c-1)*grid_resolution-xoffset; -(r-1)*grid_resolution+yoffset; 1]
    end
end

scatter(board_structure[:, 1], board_structure[:, 2], legend=false, title="3D Checkerboard Structure", aspectratio=:equal)
Out[4]:

Setting Up the System of Equations

To recap, our projection equation looks like this:

$\begin{bmatrix}x \\ y \\ 1\end{bmatrix}=\begin{bmatrix}c & s & x_H \\ 0 & c(1+m) & y_H \\ 0 & 0 & 1\end{bmatrix}\begin{bmatrix}r_{11} & r_{12} & r_{13} & t_{14} \\ r_{21} & r_{22} & r_{23} & t_{24} \\ r_{31} & r_{32} & r_{33} & t_{34}\end{bmatrix}\begin{bmatrix}X \\ Y \\ Z \\ 1\end{bmatrix}$

We realize that the $Z$ coordinate is always zero. This means we can eliminate the third column of $r$:

$\begin{bmatrix}x \\ y \\ 1\end{bmatrix}=\begin{bmatrix}c & s & x_H \\ 0 & c(1+m) & y_H \\ 0 & 0 & 1\end{bmatrix}\begin{bmatrix}r_{11} & r_{12} & t_1 \\ r_{21} & r_{22} & t_2 \\ r_{31} & r_{32} & t_3\end{bmatrix}\begin{bmatrix}X \\ Y \\ 1\end{bmatrix}$

So grouping the $K$ and $r$ matrices together, we form $H$:

$H=\begin{bmatrix}\mathbf{h_1} & \mathbf{h_2} & \mathbf{h_3}\end{bmatrix}=K\begin{bmatrix}\mathbf{r_1} & \mathbf{r_2} & \mathbf{t}\end{bmatrix}$

Where each point generates the equation:

$\begin{bmatrix}x \\ y \\ 1\end{bmatrix}=K\begin{bmatrix}\mathbf{r_1} & \mathbf{r_2} & \mathbf{t}\end{bmatrix}\begin{bmatrix}X \\ Y \\ 1\end{bmatrix}$
$\begin{bmatrix}x \\ y \\ 1\end{bmatrix}=H\begin{bmatrix}X \\ Y \\ 1\end{bmatrix}$
In [5]:
@vars X Y x y
H = [symbols("H_$i$j") for i in 1:3, j in 1:3]
eqns = H*[X; Y; 1]
latexify(map(expand, eqns))
Out[5]:
\begin{equation} \left[ \begin{array}{c} H_{13} + X \cdot H_{11} + Y \cdot H_{12} \\ H_{23} + X \cdot H_{21} + Y \cdot H_{22} \\ H_{33} + X \cdot H_{31} + Y \cdot H_{32} \\ \end{array} \right] \end{equation}

Where $x$ equals:

In [6]:
latexify(eqns[1]/eqns[3])
Out[6]:
$\frac{H_{13} + X \cdot H_{11} + Y \cdot H_{12}}{H_{33} + X \cdot H_{31} + Y \cdot H_{32}}$

Multiplying both side of the equation $x=\frac{H_{13} + X \cdot H_{11} + Y \cdot H_{12}}{H_{33} + X \cdot H_{31} + Y \cdot H_{32}}$ by the denominator $x(H_{33} + X \cdot H_{31} + Y \cdot H_{32})=H_{13} + X \cdot H_{11} + Y \cdot H_{12}$ and moving the right side to the left, so the expression equals zero:

In [7]:
latexify(expand(x*eqns[3]-eqns[1]))
Out[7]:
$ - H_{13} - X \cdot H_{11} - Y \cdot H_{12} + x \cdot H_{33} + X \cdot x \cdot H_{31} + Y \cdot x \cdot H_{32}$

Applying similar logic to $y$:

In [8]:
latexify(expand(y*eqns[3]-eqns[2]))
Out[8]:
$ - H_{23} - X \cdot H_{21} - Y \cdot H_{22} + y \cdot H_{33} + X \cdot y \cdot H_{31} + Y \cdot y \cdot H_{32}$

We notice a pattern where we can group the terms of the equations:

$- H_{13} - X \cdot H_{11} - Y \cdot H_{12} + x \cdot H_{33} + X \cdot x \cdot H_{31} + Y \cdot x \cdot H_{32}=0$
$- H_{23} - X \cdot H_{21} - Y \cdot H_{22} + y \cdot H_{33} + X \cdot y \cdot H_{31} + Y \cdot y \cdot H_{32}=0$

I like to lay it out in a table:

$H_{11}$$H_{21}$$H_{31}$$H_{12}$$H_{22}$$H_{32}$$H_{13}$$H_{23}$$H_{33}$
$-X$ $0$ $Xx$ $-Y$ $0$ $Yx$ $-1$ $0$ $x$
$0$ $-X$ $Xy$ $0$ $-Y$ $Yy$ $0$ $-1$ $y$

So we can encode the system of equations as a vector $\mathbf{a}_x=\begin{pmatrix}-X & 0 & Xx & -Y & 0 & Yx & -1 & 0 & x\end{pmatrix}$ times the vectorized form of $\mathbf{h}=reshape(H, 9, 1)$ where $\mathbf{a}_x\mathbf{h}=0$:

In [9]:
latexify([-X 0 X*x -Y 0 Y*x -1 0 x]*reshape(H, 9, 1))
Out[9]:
\begin{equation} \left[ \begin{array}{c} - H_{13} - X \cdot H_{11} - Y \cdot H_{12} + x \cdot H_{33} + X \cdot x \cdot H_{31} + Y \cdot x \cdot H_{32} \\ \end{array} \right] \end{equation}

Same goes for $y$:

In [10]:
latexify([0 -X X*y 0 -Y Y*y 0 -1 y]*reshape(H, 9, 1))
Out[10]:
\begin{equation} \left[ \begin{array}{c} - H_{23} - X \cdot H_{21} - Y \cdot H_{22} + y \cdot H_{33} + X \cdot y \cdot H_{31} + Y \cdot y \cdot H_{32} \\ \end{array} \right] \end{equation}

So then what we do is stack the vectors into one large $2N\times9$ matrix $A$ where $N$ is the number of checkerboard points:

$\begin{bmatrix}\mathbf{a}_{x_1} \\ \mathbf{a}_{y_1} \\ \mathbf{a}_{x_2} \\ \mathbf{a}_{y_2} \\ ... \\ \mathbf{a}_{x_N} \\ \mathbf{a}_{y_N} \end{bmatrix}\mathbf{h}=A\mathbf{h}=0$

And solve the system for $\mathbf{h}$ by finding the singular vector of $A$ corresponding to the smallest singular value:

In [11]:
function solve_H(board_structure, measured_points, board_size)
    A = Array{Float64}(undef, 2*board_size, 9)
    for i=1:board_size
        bstruct = board_structure[i, :]
        proj_pt = measured_points[i, :]
        A[2*(i-1)+1, :] = [-bstruct[1] 0 bstruct[1]*proj_pt[1] -bstruct[2] 0 bstruct[2]*proj_pt[1] -1 0 proj_pt[1]]
        A[2*(i-1)+2, :] = [0 -bstruct[1] bstruct[1]*proj_pt[2] 0 -bstruct[2] bstruct[2]*proj_pt[2] 0 -1 proj_pt[2]]
    end

    U, Σ, V = svd(A)
    return reshape(V[:, 9], 3, 3)
end

P = solve_H(board_structure, measured_points[1, :, :], board_size)
Out[11]:
3×3 Array{Float64,2}:
 -0.592906   -0.019074   -0.00564588
  0.0219893  -0.793971    0.00272874
 -0.0224649   0.0150492  -0.128276

And we can check how well our projection matrix works by measuring the reprojection error which ideally should show a group of points clustered close to zero:

In [12]:
reprojection_error = Array{Float64}(undef, board_dims[1]*board_dims[2], 2)
for i=1:board_dims[1]*board_dims[2]
    reprojected_pt = P*board_structure[i, :]
    reprojected_pt /= reprojected_pt[3]
    reprojection_error[i, :] = reprojected_pt[1:2] - measured_points[1, i, 1:2]
end

scatter(reprojection_error[:, 1], reprojection_error[:, 2], legend=false, title="Reprojection Error from Estimated P")
Out[12]:

Decomposing $H$

Returning to the original problem, now that we know $H$ we want to extract the calibration matrix $K$ from H: $H=K\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t}\end{bmatrix}$ How do we obtain the matrix decomposition?

We know constraints about the form of $K$, $\mathbf{r}_1$, and $\mathbf{r}_2$ that we can exploit for our custom decomposition:

  1. Define a matrix $B=K^{-\top}K^{-1}$
  2. Compute $B$ by solving homogeneous system of equations.
  3. Decompose matrix $B$.

Exploit the fact that $K$ is invertible and left-multiply on both sides of the equation: $K^{-1}\begin{bmatrix}\mathbf{h}_1 & \mathbf{h}_2 & \mathbf{h}_3\end{bmatrix}=\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t}\end{bmatrix}$ which leads to two equations (multiplying out the $K$s and $r$s):

$\mathbf{r}_1=K^{-1}\mathbf{h}_1$ and $\mathbf{r}_2=K^{-1}\mathbf{h}_2$

$\mathbf{r}_1$, $\mathbf{r}_2$, $\mathbf{r}_3$ form an orthonormal basis as they come from a rotation matrix ($\mathbf{r}_3$ is not relevant) so $\mathbf{r}_1\cdot\mathbf{r}_2=0$ and $\|\mathbf{r}_1\|=\|\mathbf{r}_2\|=1$. Substituting,

$\mathbf{r}_1^{\top}\mathbf{r}_2=\mathbf{h}_1^{\top}K^{-\top}K^{-1}\mathbf{h}_2=0$
$\|\mathbf{r}_1\|=\|\mathbf{r}_2\|=1$ substituting: $\mathbf{h}_1^{\top}K^{-\top}K^{-1}\mathbf{h}_1=\mathbf{h}_2^{\top}K^{-\top}K^{-1}\mathbf{h}_2$
rearranging the equation:
$\mathbf{h}_1^{\top}K^{-\top}K^{-1}\mathbf{h}_1-\mathbf{h}_2^{\top}K^{-\top}K^{-1}\mathbf{h}_2=0$

Now we can substitute our previously-defined $B$ matrix into the expressions:

$\mathbf{h}_1^{\top}B\mathbf{h}_2=0$

$\mathbf{h}_1^{\top}B\mathbf{h}_1-\mathbf{h}_2^{\top}B\mathbf{h}_2=0$
We know $\mathbf{h}_1$ and $\mathbf{h}_2$ and now we can estimate $B$, which can then be decomposed into $K$. $B$ is symmetric positive definite, so we can use the Cholesky decomposition to identify upper and lower triangular matrices that correspond to $K$ and $K^{\top}$ i.e. $chol(B)=AA^{\top}$ where $A=K^{-\top}$.

Estimating $B$

Exploit the system of equations

$\mathbf{h}_1^{\top}B\mathbf{h}_2=0$

$\mathbf{h}_1^{\top}B\mathbf{h}_1-\mathbf{h}_2^{\top}B\mathbf{h}_2=0$
to estimate $B$ using similar techniques we've seen above: Vecotrize $B$ and arrange the terms of the equations.

In [13]:
@vars b_11 b_12 b_13 b_22 b_23 b_33
B = [
    b_11 b_12 b_13
    b_12 b_22 b_23
    b_13 b_23 b_33
]
h_1 = [symbols("h_$(i)1") for i in 1:3]
h_2 = [symbols("h_$(i)2") for i in 1:3]
latexify(map(expand, transpose(h_1)*B*h_2))
Out[13]:
$h_{11} \cdot h_{32} \cdot b_{13} + h_{12} \cdot h_{11} \cdot b_{11} + h_{12} \cdot h_{31} \cdot b_{13} + h_{21} \cdot h_{12} \cdot b_{12} + h_{21} \cdot h_{22} \cdot b_{22} + h_{21} \cdot h_{32} \cdot b_{23} + h_{22} \cdot h_{11} \cdot b_{12} + h_{22} \cdot h_{31} \cdot b_{23} + h_{31} \cdot h_{32} \cdot b_{33}$
In [14]:
latexify(map(expand, transpose(h_1)*B*h_1-transpose(h_2)*B*h_2))
Out[14]:
$h_{11}^{2} \cdot b_{11} - h_{12}^{2} \cdot b_{11} + h_{21}^{2} \cdot b_{22} - h_{22}^{2} \cdot b_{22} + h_{31}^{2} \cdot b_{33} - h_{32}^{2} \cdot b_{33} + 2 \cdot h_{11} \cdot h_{31} \cdot b_{13} - 2 \cdot h_{12} \cdot h_{32} \cdot b_{13} + 2 \cdot h_{21} \cdot h_{11} \cdot b_{12} + 2 \cdot h_{21} \cdot h_{31} \cdot b_{23} - 2 \cdot h_{22} \cdot h_{12} \cdot b_{12} - 2 \cdot h_{22} \cdot h_{32} \cdot b_{23}$
$b_{11}$ $b_{12}$ $b_{13}$ $b_{22}$ $b_{23}$ $b_{33}$
$h_{12}h_{11}$ $h_{21}h_{12}+h_{22}h_{11}$ $h_{11}h_{32}+h_{12}h_{31}$ $h_{21}h_{22}$ $h_{21}h_{32}+h_{22}h_{31}$ $h_{31}h_{32}$
$h_{11}^2-h_{12}^2$ $2(h_{21}h_{11}-h_{22}h_{12})$ $2(h_{11}h_{31}-h_{12}h_{32})$ $h_{21}^2-h_{22}^2$ $2(h_{21}h_{31}-h_{22}h_{32})$ $h_{31}^2-h_{32}^2$

Since one $H$ is estimated per image, each calibration image yields one pair of these equations. $K$ is an upper triangular matrix, which give $B$ 6 degrees of freedom so we'll need a minimum of $\frac{6}{2}=3$ different camera views of the checkerboard to estimate $K$. We apply the same algorithm here; we stack the vectors as laid out in the table and form a matrix $\bigvee$ with dimensions $2N\times6$:

$\mathbf{v}^{ortho}_i=\begin{pmatrix}h_{12}h_{11} & h_{21}h_{12}+h_{22}h_{11} & h_{11}h_{32}+h_{12}h_{31} & h_{21}h_{22} & h_{21}h_{32}+h_{22}h_{31} & h_{31}h_{32}\end{pmatrix}$

$\mathbf{v}^{norm}_i=\begin{pmatrix}h_{11}^2-h_{12}^2 & 2(h_{21}h_{11}-h_{22}h_{12}) & 2(h_{11}h_{31}-h_{12}h_{32}) & h_{21}^2-h_{22}^2 & 2(h_{21}h_{31}-h_{22}h_{32}) & h_{31}^2-h_{32}^2\end{pmatrix}$

$\bigvee=\begin{bmatrix}\mathbf{v}^{ortho}_1 \\ \mathbf{v}^{norm}_1 \\ \mathbf{v}^{ortho}_2 \\ \mathbf{v}^{norm}_2 \\ ... \\ \mathbf{v}^{ortho}_N \\ \mathbf{v}^{norm}_N\end{bmatrix}$
for each image $i=1:N$.

Since $B$ is symmetric, we parameterize its upper triangular into a vector $\mathbf{b}=\begin{pmatrix}b_{11} & b_{12} & b_{13} & b_{22} & b_{23} & b_{33}\end{pmatrix}$ and finally setup our complete system:

$\bigvee\mathbf{b}=0$
We know the drill, apply SVD and extract the singular vector corresponding to the smallest singular value which is our solution for $\mathbf{b}$.

In [15]:
 = Array{Float64}(undef, 2*num_images, 6)
H_cache = Array{Float64}(undef, num_images, 3, 3)
for i=1:num_images
    # calculate H given the image
    h = solve_H(board_structure, measured_points[i, :, :], board_size)
    H_cache[i, :, :] = h
    
    # setup the 2 equations specifying the orthogonal and normalized constraints for b and H
    v_ortho = [
        h[1,2]*h[1,1]
        h[2,1]*h[1,2]+h[2,2]*h[1,1]
        h[1,1]*h[3,2]+h[1,2]*h[3,1]
        h[2,1]*h[2,2]
        h[2,1]*h[3,2]+h[2,2]*h[3,1]
        h[3,1]*h[3,2]
    ]
    
    v_norm = [
        h[1,1]^2-h[1,2]^2
        2*(h[2,1]*h[1,1]-h[2,2]*h[1,2])
        2*(h[1,1]*h[3,1]-h[1,2]*h[3,2])
        h[2,1]^2-h[2,2]^2
        2*(h[2,1]*h[3,1]-h[2,2]*h[3,2])
        h[3,1]^2-h[3,2]^2
    ]
    
    [2*i-1, :] = v_ortho
    [2*i, :]   = v_norm
end

Out[15]:
18×6 Array{Float64,2}:
  0.0113091    0.470331    -0.00849428  -0.0174588    0.0181674  -0.000338079
  0.351174    -0.0563636    0.0272132   -0.629906     0.0229093   0.000278192
 -0.0177294    0.467131     0.0428096    0.00774105   0.0484705   0.00455643
  0.352593     0.0585967    0.0766684   -0.617998    -0.11681    -0.00194639
 -0.00414189   0.469935     0.0166796   -0.00232409   0.052026    0.00190061
  0.346176     0.00781846   0.0771989   -0.637783    -0.0469033   0.00340931
  0.00710635   0.468337     0.0199959   -0.00130733  -0.0600913  -0.00267978
  0.347262    -0.0211003   -0.0899153   -0.631411    -0.0561221   0.00444961
  0.0186353    0.462824     0.0657441   -0.0683729    0.0325496   0.00577375
  0.357503    -0.153696     0.0579529   -0.596911    -0.175896   -0.00853773
  0.0315302    0.463474    -0.0276208   -0.0318142    0.0635873  -0.0041917
  0.340704    -0.132409     0.0964729   -0.629395     0.0799396   0.00303837
  0.0221467    0.462155    -0.044367    -0.0596177   -0.0478888   0.00479053
  0.350313    -0.148747    -0.0752882   -0.60899      0.120937   -0.000362447
 -0.118196     0.354056     0.0300903    0.202126     0.02421     0.000652692
  0.269052     0.61827      0.0370823   -0.465908    -0.0791296  -0.00323445
  0.16969      0.0388653    0.0419483   -0.314209    -0.0291114   0.0016837
  0.0361583   -0.923859    -0.0410146   -0.038846    -0.112685   -0.00542828

Sometimes, the solution $b$ is not always positive definite because the eigenvalues may be negative. We can fix this by flipping the sign of the eigenvalues/vectors to make $B$ PSD:

In [16]:
U, Σ, V = svd()
println("Singular values: $(Σ)")
b = V[:, 6]
println("Smallest singular vector: $(b)")
B = [
    b[1] b[2] b[3];
    b[2] b[4] b[5];
    b[3] b[5] b[6]
]
Singular values: [2.015257939547654, 1.7211360552264312, 0.31310180633621654, 0.2094404431525653, 0.0369800436974061, 0.0016371186799513879]
Smallest singular vector: [-0.3219449627815607, 0.00032981145073682216, -0.016122823914785308, -0.18162409368073099, 0.001851482259880053, -0.9290321024168616]
Out[16]:
3×3 Array{Float64,2}:
 -0.321945      0.000329811  -0.0161228
  0.000329811  -0.181624      0.00185148
 -0.0161228     0.00185148   -0.929032
In [17]:
if !isposdef(Symmetric(B))
    println("Not positive definite!")
    B = -B
end
println("Now is positive definite? $(isposdef(Symmetric(B)))")
Not positive definite!
Now is positive definite? true

Finally, apply the Cholesky decomposition to extract the upper triangular $K$: $B=K^{-\top}K^{-1}$

In [18]:
C = cholesky(Symmetric(B))
C.U'*C.U
Out[18]:
3×3 Array{Float64,2}:
  0.321945     -0.000329811   0.0161228
 -0.000329811   0.181624     -0.00185148
  0.0161228    -0.00185148    0.929032
In [19]:
K⁻¹ = C.U
K = inv(K⁻¹)
K = inv(normalization_matrix)*K/K[3,3]
Out[19]:
3×3 Array{Float64,2}:
 543.352    0.741088  303.978
   0.0    542.559     242.425
   0.0      0.0         1.0

Extracing Extrinsics

Now that we have the the camera instrinsic calibration, we can start estimating the motion of the camera (i.e. extrinsics). Revisting our old friend:

$\begin{bmatrix}\mathbf{h}_1 & \mathbf{h}_2 & \mathbf{h}_3\end{bmatrix} = K\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t}\end{bmatrix}$
We can expand that expression to solve for $\mathbf{r}_1$ and $\mathbf{r}_2$:
$\mathbf{r}_1=\lambda K^{-1}\mathbf{h}_1$ and $\mathbf{r}_2=\lambda K^{-1}\mathbf{h}_2$ and $\mathbf{r}_3=\mathbf{r}_1 \times \mathbf{r}_2$ where $\lambda=\frac{1}{\|K^{-1}\mathbf{h}_1\|}=\frac{1}{\|K^{-1}\mathbf{h}_2\|}$ serves as a normalization constant.
The translation component is $\mathbf{t}=\lambda K^{-1}\mathbf{h}_3$

Protip

Due to noise in the data, $\begin{bmatrix}\mathbf{r}_1 & \mathbf{r}_2 & \mathbf{t}\end{bmatrix}$ likely approximates a true rotation matrix. We'll use SVD to force its columns to be orthonormal.

In [20]:
camera_extrinsics = Array{Float64}(undef, num_images, 3, 4)
for i=1:num_images
    h₁ = H_cache[i, :, 1]
    h₂ = H_cache[i, :, 2]
    h₃ = H_cache[i, :, 3]
   
    λ = 1/norm(K⁻¹*h₃)
    r₁ = λ*K⁻¹*h₁
    r₂ = λ*K⁻¹*h₂
    r₃ = r₁×r₂
    t = λ*K⁻¹*h₃
    
    R = [r₁ r₂ r₃]
    
    U, _, V = svd(R)
    R = U*V'
    
    camera_extrinsics[i, :, :] = [R t]
end

scatter(camera_extrinsics[:, 1, 4], camera_extrinsics[:, 2, 4], camera_extrinsics[:, 3, 4], camera=(45, 45), legend=false, title="Calibrated Positions of Cameras")
Out[20]:

Huge thanks to Professor Cyrill Stachniss' lectures on photogrammetry and computer vision, which this report was based on.

Epilogue

Here we established a method to estimate the parameters of a camera's intrinsic parameters. However, this method is not statistically optimal (assuming Gaussian noise). The next step is to use iterative nonlinear least-squares algorithms to refine the camera parameters, which also opens up the opportunity to estimate nonlinear distortion parameters too. Those topics will be covered in another report but in the meantime, this direct algebraic method serves as a good initialization point upon which further iterative refinement can be performed.

In [21]:
save("calibration/initial_parameters.jld",
    "camera_extrinsics", camera_extrinsics,
    "K", K,
    "board_structure", board_structure)

An alternative to using Cholesky decomposition for factoring $B$ into $K$ is through a direct formula listed in Zhang's report (I suspect this is just the analytical form for Cholesky decomposition):

In [22]:
v₀ = (B[1,2]*B[1,3]-B[1,1]B[2,3])/(B[1,1]B[2,2]-B[1,2]^2)
λ = B[3,3]-(B[1,3]^2+v₀*(B[1,2]*B[1,3]-B[1,1]*B[2,3]))/B[1,1]
α = sqrt(λ/B[1,1])
β = sqrt(λ*B[1,1]/(B[1,1]*B[2,2]-B[1,2]^2))
γ = -B[1,2]*α^2*β/λ
u₀ = γ*v₀/β-B[1,3]*α^2/λ
inv(normalization_matrix)*[
    α γ u₀;
    0 β v₀;
    0 0 1
]
Out[22]:
3×3 Array{Float64,2}:
 543.352    0.741088  303.978
   0.0    542.559     242.425
   0.0      0.0         1.0
In [ ]: