Minimally Parameterized Projective Geometry of Multiple Views

Raquel Romano

Problem: The automatic extraction of the unknown geometric relations among a set of cameras capturing images of a common scene is a fundamental problem in computer vision. This work addresses the problem of estimating the configuration of arbitrarily many cameras using image points matched across multiple frames, without requiring a full reconstruction of the viewed scene points.
Motivation:Fast and inexpensive processors, cameras, and storage are making feasible a variety of applications involving image sequences and collections, ranging from visually intelligent automobiles to enhanced sports viewing to home video manipulation. A prerequisite for many of these applications is knowledge of the physical positions of the cameras that captured the original images, whether it be a collection of stationary cameras placed around a common scene, or a single video camera capturing frames as it moves through an environment. Accurate estimation of the geometry of multiple views is the bedrock of such applications.
Previous Work Projective geometric models of multiple views give rise to algebraic relations among the locations of points in uncalibrated images that are 2D projections of rigidly moving points in the scene. These algebraic relations are multilinear in the image coordinates and their forms depend on the unknown projective parameters relating the cameras. The projective parameters in turn depend on the external and internal camera parameters, which may be directly recovered without having to estimate the 3D structure of the scene points
There exist 2-, 3-, and 4-view projective multilinear constraints, expressed via the fundamental matrix, the trifocal tensor, and the quadrilinear tensor, respectively, that have been well-studied and used in many practical situations, while their N-view analogs have been explored theoretically. Since the projective relationships among more than 4 cameras in general position are completely described by the collection of all 2-view relationships, a single N-view model is not necessarily required. Moreover, the projective multilinearities that simultaneously describe how N matching points must be related have exponentially many linear parameters that are highly coupled through their dependence on the camera properties and positions. Enforcing these parameter dependencies is a difficult and active area of research and an important consideration for modeling the geometry of a large collection of views, such as all the frames in a video sequence.
Approach:We present an N-focal projective model of camera geometry that is minimally parameterized and, by construction, inherently geometrically consistent. Using the fact that all multifocal constraints of a collection of cameras in general position may be reduced to a collection of bifocal constraints, we build an N-view model from a sparse set bifocal parameters that exploit the dependencies among the pairwise geometric relationships.
For each new view added to a camera collection with known configuration, its relation to the entire collection is completely defined by its bifocal relation to two other views, via the positions of the epipoles in each view, and the epipolar collineation that maps epipolar lines from one view to the other. For an isolated view pair, these quantities have a total of 7 degrees of freedom, but once the new view's relation to a single existing view is fixed, its relation to a second view in the collection is constrained to only 4 degrees of freedom. We explicitly model these dependent bifocal relations, and show how for the dependent view pair, each epipole's variation is constrained to a line (leaving 1 DOF each), and the epipolar collineation is constrained by one line match (leaving 2 DOF). We compute this minimal parameter set from point matches using a Levenberg-Marquardt minimization, initialized from the linear estimation of the fundamental matrix.
We test the minimally parameterized model on a simulated camera configuration viewing a block of points at the scene's origin. Two algorithms are compared: the nonlinear 7-parameter optimization of independent fundamental matrices, and the nonlinear optimization of dependent 4-parameter bifocal models. Since the projective models are defined only up to an arbitrary projective transformation, we use the intrinsic parameters of the simulated cameras to recover the Euclidean positions from the bifocal parameters and display the solutions. Below are the recovered camera positions overlaid on the ground truth camera positions, and Table 1 shows the improved accuracy in camera position gained by enforcing bifocal dependencies.
(a) Ground truth camera positions and viewed points.
(b) Independent 7-parameter fundamental matrices. (c) Dependent 4-parameter bifocal parameters.
Camera 1 2 3 4
(a) Nonlinear Independent 2.2068 2.2068 2.2068 2.2068
(b) Nonlinear Dependent 0.1593 0.1593 0.1593 0.1593
Table 1: Translation errors for recovery of camera positions from bifocal parameters. (a) Independent 7-parameter estimates. (b) Dependent 4-parameter estimates.
Impact:While camera motion estimation is a mature problem in the field of computer vision, it is far from being solved. A geometrically consistent model of arbitrarily many views that can be automatically recovered from matching points is essential to any application whose performance depends on first establishing the relative positions of cameras.
Future Work:A difficult and pervasive issue is the degeneracy of camera configurations, particularly in video sequences taken with approximately linear camera motions. If a model has too many parameters, such a sequence will be overfit, while if a model has too few parameters, the complexity of some sequences will be too rich to describe. It would be useful to use information criteria to determine which subsets of views are degenerate and thus automatically choose appropriate view pairs to model.
Projective Minimal Analysis of Camera Geometry. Raquel A. Romano. Ph.D. Thesis. May 2002. (ps.gz)


romano@ai.mit.edu