SciPy 中的大规模捆绑调整¶

日期	2016-10-22（最后修改）

捆绑调整问题出现在 3-D 重建中，它可以表述如下（摘自 https://en.wikipedia.org/wiki/Bundle_adjustment）

给定一组从不同视角描绘多个 3D 点的图像，捆绑调整可以定义为同时优化描述场景几何形状的 3D 坐标以及用于获取图像的相机(s)的相对运动参数和光学特性的问题，根据涉及所有点对应图像投影的优化标准。

更准确地说。我们有一组在现实世界中定义的点，它们在某个先验选择的“世界坐标系”中由其坐标 (X, Y, Z) 定义。我们用不同的相机拍摄这些点，这些相机以其相对于世界坐标系的方位和平移以及焦距和两个径向畸变参数（总共 9 个参数）为特征。然后，我们精确地测量相机在图像上投影的点的 2-D 坐标 (x, y)。我们的任务是通过最小化重投影误差的平方和来优化原始点的 3-D 坐标以及相机参数。

令 $\pmb{P} = (X, Y, Z)^T$ - 点的半径向量，$\pmb{R}$ - 相机的旋转矩阵，$\pmb{t}$ - 相机的平移向量，$f$ - 它的焦距，$k_1, k_2$ - 它的畸变参数。然后，重投影如下进行

\begin{align} \pmb{Q} = \pmb{R} \pmb{P} + \pmb{t} \\ \pmb{q} = -\begin{pmatrix} Q_x / Q_z \\ Q_y / Q_z \end{pmatrix} \\ \pmb{p} = f (1 + k_1 \lVert \pmb{q} \rVert^2 + k_2 \lVert \pmb{q} \rVert^4) \pmb{q} \end{align}

得到的向量 $\pmb{p}=(x, y)^T$ 包含原始点的图像坐标。这种模型被称为“针孔相机模型”，我在此找到了一些关于此主题的非常好的笔记 http://www.comp.nus.edu.sg/~cs4243/lecture/camera.pdf

现在让我们开始解决一些真实的束调整问题。我们将从 http://grail.cs.washington.edu/projects/bal/ 中获取一个问题。

在 [1]

from __future__ import print_function

在 [2]

import urllib
import bz2
import os
import numpy as np

首先下载数据文件

在 [3]

BASE_URL = "http://grail.cs.washington.edu/projects/bal/data/ladybug/"
FILE_NAME = "problem-49-7776-pre.txt.bz2"
URL = BASE_URL + FILE_NAME

在 [4]

if not os.path.isfile(FILE_NAME):
    urllib.request.urlretrieve(URL, FILE_NAME)

现在从文件中读取数据

在 [5]

def read_bal_data(file_name):
    with bz2.open(file_name, "rt") as file:
        n_cameras, n_points, n_observations = map(
            int, file.readline().split())

        camera_indices = np.empty(n_observations, dtype=int)
        point_indices = np.empty(n_observations, dtype=int)
        points_2d = np.empty((n_observations, 2))

        for i in range(n_observations):
            camera_index, point_index, x, y = file.readline().split()
            camera_indices[i] = int(camera_index)
            point_indices[i] = int(point_index)
            points_2d[i] = [float(x), float(y)]

        camera_params = np.empty(n_cameras * 9)
        for i in range(n_cameras * 9):
            camera_params[i] = float(file.readline())
        camera_params = camera_params.reshape((n_cameras, -1))

        points_3d = np.empty(n_points * 3)
        for i in range(n_points * 3):
            points_3d[i] = float(file.readline())
        points_3d = points_3d.reshape((n_points, -1))

    return camera_params, points_3d, camera_indices, point_indices, points_2d

在 [6]

camera_params, points_3d, camera_indices, point_indices, points_2d = read_bal_data(FILE_NAME)

这里我们有 numpy 数组

camera_params 形状为 (n_cameras, 9)，包含所有相机的参数初始估计。每行的前 3 个分量构成旋转向量 (https://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula)，接下来的 3 个分量构成平移向量，然后是焦距和两个畸变参数。
points_3d 形状为 (n_points, 3)，包含世界坐标系中点坐标的初始估计。
camera_ind 形状为 (n_observations,)，包含每个观测中涉及的相机的索引（从 0 到 n_cameras - 1）。
point_ind 形状为 (n_observations,)，包含每个观测中涉及的点的索引（从 0 到 n_points - 1）。
points_2d 形状为 (n_observations, 2)，包含每个观测中投影到图像上的点的测量二维坐标。

数字如下

在 [7]

n_cameras = camera_params.shape[0]
n_points = points_3d.shape[0]

n = 9 * n_cameras + 3 * n_points
m = 2 * points_2d.shape[0]

print("n_cameras: {}".format(n_cameras))
print("n_points: {}".format(n_points))
print("Total number of parameters: {}".format(n))
print("Total number of residuals: {}".format(m))

n_cameras: 49
n_points: 7776
Total number of parameters: 23769
Total number of residuals: 63686

我们选择了一个相对较小的问题来减少计算时间，但 scipy 的算法能够解决更大的问题，尽管所需时间会按比例增长。

现在定义一个返回残差向量的函数。我们使用 numpy 向量化计算

在 [8]

def rotate(points, rot_vecs):
    """Rotate points by given rotation vectors.
    
    Rodrigues' rotation formula is used.
    """
    theta = np.linalg.norm(rot_vecs, axis=1)[:, np.newaxis]
    with np.errstate(invalid='ignore'):
        v = rot_vecs / theta
        v = np.nan_to_num(v)
    dot = np.sum(points * v, axis=1)[:, np.newaxis]
    cos_theta = np.cos(theta)
    sin_theta = np.sin(theta)

    return cos_theta * points + sin_theta * np.cross(v, points) + dot * (1 - cos_theta) * v

在 [9]

def project(points, camera_params):
    """Convert 3-D points to 2-D by projecting onto images."""
    points_proj = rotate(points, camera_params[:, :3])
    points_proj += camera_params[:, 3:6]
    points_proj = -points_proj[:, :2] / points_proj[:, 2, np.newaxis]
    f = camera_params[:, 6]
    k1 = camera_params[:, 7]
    k2 = camera_params[:, 8]
    n = np.sum(points_proj**2, axis=1)
    r = 1 + k1 * n + k2 * n**2
    points_proj *= (r * f)[:, np.newaxis]
    return points_proj

在 [10]

def fun(params, n_cameras, n_points, camera_indices, point_indices, points_2d):
    """Compute residuals.
    
    `params` contains camera parameters and 3-D coordinates.
    """
    camera_params = params[:n_cameras * 9].reshape((n_cameras, 9))
    points_3d = params[n_cameras * 9:].reshape((n_points, 3))
    points_proj = project(points_3d[point_indices], camera_params[camera_indices])
    return (points_proj - points_2d).ravel()

您可以看到计算 fun 的雅可比矩阵很麻烦，因此我们将依赖于有限差分近似。为了使此过程在时间上可行，我们提供了雅可比矩阵稀疏结构（即标记已知非零的元素）

在 [11]

from scipy.sparse import lil_matrix

在 [12]

def bundle_adjustment_sparsity(n_cameras, n_points, camera_indices, point_indices):
    m = camera_indices.size * 2
    n = n_cameras * 9 + n_points * 3
    A = lil_matrix((m, n), dtype=int)

    i = np.arange(camera_indices.size)
    for s in range(9):
        A[2 * i, camera_indices * 9 + s] = 1
        A[2 * i + 1, camera_indices * 9 + s] = 1

    for s in range(3):
        A[2 * i, n_cameras * 9 + point_indices * 3 + s] = 1
        A[2 * i + 1, n_cameras * 9 + point_indices * 3 + s] = 1

    return A

现在我们准备运行优化。让我们可视化使用初始参数评估的残差。

在 [13]

%matplotlib inline
import matplotlib.pyplot as plt

在 [14]

x0 = np.hstack((camera_params.ravel(), points_3d.ravel()))

在 [15]

f0 = fun(x0, n_cameras, n_points, camera_indices, point_indices, points_2d)

在 [16]

plt.plot(f0)

输出[16]

[<matplotlib.lines.Line2D at 0x1067696d8>]

在 [17]

A = bundle_adjustment_sparsity(n_cameras, n_points, camera_indices, point_indices)

在 [18]

import time
from scipy.optimize import least_squares

在 [19]

t0 = time.time()
res = least_squares(fun, x0, jac_sparsity=A, verbose=2, x_scale='jac', ftol=1e-4, method='trf',
                    args=(n_cameras, n_points, camera_indices, point_indices, points_2d))
t1 = time.time()

   Iteration     Total nfev        Cost      Cost reduction    Step norm     Optimality
       0              1         8.5091e+05                                    8.57e+06
       1              3         5.0985e+04      8.00e+05       1.46e+02       1.15e+06
       2              4         1.6077e+04      3.49e+04       2.59e+01       2.43e+05
       3              5         1.4163e+04      1.91e+03       2.86e+02       1.21e+05
       4              7         1.3695e+04      4.67e+02       1.32e+02       2.51e+04
       5              8         1.3481e+04      2.14e+02       2.24e+02       1.54e+04
       6              9         1.3436e+04      4.55e+01       3.18e+02       2.73e+04
       7             10         1.3422e+04      1.37e+01       6.84e+01       2.20e+03
       8             11         1.3418e+04      3.70e+00       1.28e+02       7.88e+03
       9             12         1.3414e+04      4.19e+00       2.64e+01       6.24e+02
      10             13         1.3412e+04      1.88e+00       7.55e+01       2.61e+03
      11             14         1.3410e+04      2.09e+00       1.77e+01       4.97e+02
      12             15         1.3409e+04      1.04e+00       4.02e+01       1.32e+03
`ftol` termination condition is satisfied.
Function evaluations 15, initial cost 8.5091e+05, final cost 1.3409e+04, first-order optimality 1.32e+03.

在 [20]

print("Optimization took {0:.0f} seconds".format(t1 - t0))

Optimization took 33 seconds

设置 scaling='jac' 的目的是自动缩放变量并使它们对成本函数的影响相等（显然，相机参数和点的坐标是截然不同的实体）。这个选项被证明对成功进行捆绑调整至关重要。

现在让我们绘制找到的解的残差

在 [21]

plt.plot(res.fun)

输出[21]

[<matplotlib.lines.Line2D at 0x10de73438>]

现在我们看到了残差的更清晰的图像，平均值非常接近零。还有一些尖峰。这可以通过数据中的异常值来解释，或者可能是算法找到了一个局部最小值（尽管是一个非常好的最小值）或者没有收敛到足够程度。请注意，该算法使用雅可比有限差分近似，这可能会在最小值附近阻碍进度，因为精度不足（但同样，为这个问题计算精确的雅可比矩阵非常困难）。

章节作者：Nikolay Mayorov