cv08 part17 reconstruction3

(1)

Computer Vision – Lecture 17

Structure-from-Motion

21.01.2009

Bastian Leibe

RWTH Aachen

http://www.umic.rwth-aachen.de/multimedia

[email protected]

(2)

Course Outline

• Image Processing Basics

• Segmentation & Grouping

• Object Recognition

• Local Features & Matching

• Object Categorization

• 3D Reconstruction



Epipolar Geometry and Stereo Basics



Camera calibration & Uncalibrated Reconstruction



Structure-from-Motion

• Motion and Tracking

(3)

Recap: A General Point

• Equations of the form

• How do we solve them? (always!)



Apply SVD



Singular values of A = square roots of the eigenvalues of

A

A.



The solution of Ax=0 is the

nullspace

vector of A.



This corresponds to the

smallest singular vector

of A.

Ax 0



11 11 1

T N T

NN N NN

d

v

d

v

�

��

�

��

�



_�

_��

_�

�

��

�

��

�

A UDV

U

L

O

M O

M

L

SVD

Singular values

Singular vectors

(4)

Properties of SVD

• Frobenius norm



Generalization of the Euclidean norm to matrices

• Partial reconstruction property of SVD



Let



i=1,…,N

be the singular values of

A

.



Let

A

= U

D

V

be the reconstruction of

A

when we set



p+1

,…,



to zero.



Then

A

= U

D

V

is the best rank-p approximation of

A

in

the sense of the Frobenius norm

(i.e. the best least-squares approximation).

1 1

m n

ij F

i j

A

a

 



��

min( , ) 2

m n

i i





�

(5)

Recap: Camera Parameters

• Intrinsic parameters



Principal point coordinates



Focal length



Pixel magnification factors



Skew (non-rectangular pixels)



Radial distortion

• Extrinsic parameters



Rotation R



Translation t

(both relative to world coordinate system)

• Camera projection matrix



General pinhole camera: 9 DoF



CCD Camera with square pixels: 10 DoF



General camera:

11 DoF

B. Leibe

0 0

1 1 1

x x x

y y y

m f p x

K m f p y

 

� ��

_� _�� _{� �} _�

� ��

s





(6)

Recap: Calibrating a Camera

Goal

• Compute intrinsic and

extrinsic parameters using

observed camera data.

Main idea

• Place “calibration object”

with known geometry in the

scene

• Get correspondences

• Solve for mapping from scene

to image: estimate P=P

int

P

ext

B. Leibe

Slide credit: Kristen Grauman P?

X_i x_i

(7)

P e rc e p tu a l a n d S e n s o ry A u g m e n te d C o m p u ti n g C o m p u te r V is io n W S 0 8 /0 9

Recap: Camera Calibration (DLT

Algorithm)

• P has 11 degrees of freedom.

• Two linearly independent equations per independent

2D/3D correspondence.

• Solve with SVD (similar to homography estimation)



Solution corresponds to smallest singular vector.

• 5 ½ correspondences needed for a minimal solution.

B. Leibe

Slide adapted from Svetlana Lazebnik

0 p

A



0 P

X

0 X

0

3 2 1 1 1 1 1 1 1







T n n T T n T n n T n T T T T T T T

x

y

x

y



(8)

P e rc e p tu a l a n d S e n s o ry A u g m e n te d C o m p u ti n g C o m p u te r V is io n W S 0 8 /0 9

• Two independent equations each in terms of

three unknown entries of X.

• Stack equations and solve with SVD.

• This approach nicely generalizes to multiple cameras.

Recap: Triangulation – Linear Algebraic

Approach

8 B. Leibe

X

P

x

X

P

x

2 2 2 1 1 1





0 X

P

x

0 X

P

x

2

1 







0 X

P

]

[x

0 X

P

]

[x

2

1 



Slide credit: Svetlana Lazebnik

O₁ O₂

x₁ x2

X? R₁ R₂

(9)

Recap: Epipolar Geometry –

Calibrated Case

B. Leibe

x x’

Camera matrix:

[I|0]

X

= (

u

,

v

,

w

,

1 )

x

= (

u

,

v

,

w

)

Camera matrix:

[

R

| –

R

_t

_]

Vector

x

’

in second

coord. system has

coordinates

Rx

’

in the

first one.

t

The vectors

x

,

t

, and

Rx’

are coplanar

R

(10)

Recap: Epipolar Geometry –

Calibrated Case

B. Leibe

x x’

Slide credit: Svetlana Lazebnik

Essential Matrix

(Longuet-Higgins, 1981)

0 )]

(

[









t

R

x

(11)

Recap: Epipolar Geometry –

Uncalibrated Case

• The calibration matrices

K

and

K’

of the two

cameras are unknown

• We can write the epipolar constraint in terms of

unknown

normalized coordinates:

B. Leibe

x x’

Slide credit: Svetlana Lazebnik

0 ˆ

E

x





(12)

Recap: Epipolar Geometry –

Uncalibrated Case

B. Leibe

x x’

Slide credit: Svetlana Lazebnik

Fundamental Matrix

(Faugeras and Luong, 1992)

0 ˆ

_E

_x



_

x

T

x

K

x

K

x











ˆ

with

0











F

K

E

K

x

F

(13)

• Problem: poor numerical conditioning

Recap: The Eight-Point

Algorithm

B. Leibe

x

= (

u

,

v

, 1)

_,

_x’

_{= (}

_u’

_,

_v’

_{, 1)}

Minimize:

under the constraint

|

F

|

_{= 1}

2

1 )

(

_i

N

i

T

i

F

x







(14)

Recap: Normalized Eight-Point

Algorithm

1. Center the image data at the origin, and scale it so

the mean squared distance between the origin and

the data points is 2 pixels.

2. Use the eight-point algorithm to compute

F

from

the normalized points.

3. Enforce the rank-2 constraint using SVD.

4. Transform fundamental matrix back to original

units: if T and T’ are the normalizing

transformations in the two images, than the

fundamental matrix in original coordinates is

T

F T’.

B. Leibe _{[Hartley, 1995]}

Slide credit: Svetlana Lazebnik

11 11 13

33 31 33

T T

d

v

F

d

v

�

��

�

��

�



_�

_��

_�

�

��

�

��

�

UDV

U

L

M O

M

L

SVD

Set d

₃₃

to

zero and

reconstruct F

(15)

Recap: Comparison of Estimation

Algorithms

B. Leibe

8-point Normalized 8-point Nonlinear least squares Av. Dist. 1 2.33 pixels 0.92 pixel 0.86 pixel

Av. Dist. 2 2.18 pixels 0.85 pixel 0.80 pixel

(16)

Recap: Epipolar Transfer

• Assume the epipolar geometry is known

• Given projections of the same point in two

images, how can we compute the projection of

that point in a third image?

B. Leibe

x

₁

x

₂

x

₃

l

₃₂

l

₃₁

l

₃₁

= F

x

l

₃₂

= F

x

(17)

Recap: Active Stereo with

Structured Light

• Optical triangulation



Project a single stripe of laser light



Scan it across the surface of the object



This is a very precise version of structured light scanning

B. Leibe

Digital Michelangelo Project

http://graphics.stanford.edu/projects/mich/

(18)

Topics of This Lecture

• Structure from Motion (SfM)



Motivation



Ambiguity

• Affine SfM



Affine cameras



Affine factorization



Euclidean upgrade



Dealing with missing data

• Projective SfM



Two-camera case



Projective factorization



Bundle adjustment



Practical considerations

• Applications

(19)

Structure from Motion

• Given:

m

images of

n

fixed 3D points

x

=

P

X

,

i =

1 , … , m, j =

1 , … , n

• Problem: estimate

m

projection matrices

P

and

n

3D points

X

from the

mn

correspondences

x

₁_j

x

₂_j

x

X

P

₁

P

₂

P

₃

B. Leibe

(20)

What Can We Use This For?

B. Leibe

• E.g. movie special effects

Video

(21)

Structure from Motion

Ambiguity

• If we scale the entire scene by some factor

k

and, at the same time, scale the camera matrices

by the factor of 1/

k

, the projections of the scene

points in the image remain exactly the same:



It is impossible to recover the absolute scale of

the scene!

B. Leibe

)

(

1 X

P

PX

x

k



_





(22)

Structure from Motion

Ambiguity

• If we scale the entire scene by some factor

k

and, at the same time, scale the camera

matrices by the factor of 1/

k

, the projections

of the scene points in the image remain

exactly the same.

• More generally: if we transform the scene

using a transformation

Q

and apply the

inverse transformation to the camera

matrices, then the images do not change

B. Leibe

Slide credit: Svetlana Lazebnik



PQ





QX



PX

x



PX





PQ

-1





QX



-1



(23)

Reconstruction Ambiguity:

Similarity

B. Leibe



PQ





Q

X



PX

x



-1

_S

(24)

Reconstruction Ambiguity:

Affine

B. Leibe

Slide credit: Svetlana Lazebnik



PQ





Q

X



PX

x



-1

_A

(25)

Reconstruction Ambiguity:

Projective

B. Leibe



PQ





Q

X



PX

x



-1

_P

(26)

Projective Ambiguity

B. Leibe

(27)

From Projective to Affine

B. Leibe

(28)

From Affine to Similarity

B. Leibe

(29)

P e rc e p tu a l a n d S e n s o ry A u g m e n te d C o m p u ti n g C o m p u te r V is io n W S 0 8 /0 9

Hierarchy of 3D

Transformations

• With no constraints on the camera calibration matrix or on the

scene, we get a

projective

reconstruction.

• Need additional information to

upgrade

the reconstruction to

affine, similarity, or Euclidean.

29 B. Leibe



v

t

A

Projectiv

e

15dof

Affine

12dof

Similari

ty

7dof

Euclidea

n

6dof

Preserves intersection and tangency Preserves parallellism, volume ratios Preserves angles, ratios of length



1

0 t

A



1

0 t

R

s



1

0 t

R

T Preserves angles,

lengths

(30)

Topics of This Lecture

• Structure from Motion (SfM)



Motivation



Ambiguity

• Affine SfM



Affine cameras



Affine factorization



Euclidean upgrade



Dealing with missing data

• Projective SfM



Two-camera case



Projective factorization



Bundle adjustment



Practical considerations

• Applications

(31)

Structure from Motion

• Let’s start with

affine cameras

(the math is

easier)

B. Leibe

center at infinity

(32)

Orthographic Projection

• Special case of perspective projection



Distance from center of projection to image plane is

infinite



Projection matrix:

B. Leibe

Slide credit: Steve Seitz

(33)

Affine Cameras

B. Leibe

Orthographic Projection

Parallel Projection

(34)

P e rc e p tu a l a n d S e n s o ry A u g m e n te d C o m p u ti n g C o m p u te r V is io n W S 0 8 /0 9

Affine Cameras

• A general affine camera combines the effects of

an affine transformation of the 3D space,

orthographic projection, and an affine

transformation of the image:

• Affine projection is a linear mapping + translation

in inhomogeneous coordinates

34 B. Leibe

















1

0 b

A

P

1

0 ]

affine

4 [

1

0

1

0

1 ]

affine

3 [

₂₁ ₂₂ ₂₃ ₂

1 13 12 11

b

a

b

a

x

X

a

₁

a

₂

b

AX

x

_



























2 1 23 22 21 13 12 11

b

Z

Y

X

a

y

x

Projection of

world origin

(35)

Affine Structure from Motion

• Given:

m

images of

n

fixed 3D points:

• x

=

A

X

+

b

, i =

1 ,… , m, j =

1 , … , n

• Problem: use the

mn

correspondences x

to estimate

m

projection matrices A

and translation vectors b

,

and

n

points X

• The reconstruction is defined up to an arbitrary

affine

transformation Q (12 degrees of freedom):

• We have 2

mn

knowns and 8

m

+ 3

n

unknowns (minus

12 dof for affine ambiguity).



Thus, we must have 2

mn

>= 8

m

+ 3

n

– 12.



For two views, we need four point correspondences.

B. Leibe























_

1 X

Q

1 X

,

Q

1

0 b

A

1

0 b

A

₁

(36)

P e rc e p tu a l a n d S e n s o ry A u g m e n te d C o m p u ti n g C o m p u te r V is io n W S 0 8 /0 9

Affine Structure from Motion

• Centering: subtract the centroid of the image

points

• For simplicity, assume that the origin of the

world coordinate system is at the centroid of

the 3D points.

• After centering, each normalized point x

is

related to the 3D point X

by

36 B. Leibe





j i n k k j i n k i k i i j i n k ik ij ij

n

X

A

X

A

b

X

A

b

X

A

x

ˆ

1 ˆ

1 1 1























  

j

i

ij

A

X

x

ˆ



(37)

Affine Structure from Motion

• Let’s create a 2

m

×

n

data (measurement)

matrix:

B. Leibe





mn

m

n

x

D

ˆ

2

1

2

22

21

1

12

11 





Cameras

(2

m)

Points (n)

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

(38)

P e rc e p tu a l a n d S e n s o ry A u g m e n te d C o m p u ti n g C o m p u te r V is io n W S 0 8 /0 9

Affine Structure from Motion

• Let’s create a 2

m

×

n

data (measurement)

matrix:

• The measurement matrix

D = MS

must have rank

3!

B. Leibe

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

Slide credit: Svetlana Lazebnik

Cameras

(2

m × 3)



_n



m

mn

m

n

X

A

x

D











2

1

2

1

2

1

2

22

21

1

12

11 ˆ









(39)

Factorizing the Measurement

Matrix

B. Leibe

(40)

Factorizing the Measurement

Matrix

• Singular value decomposition of D:

40 Slide credit: Martial Hebert

(41)

Factorizing the Measurement

Matrix

• Singular value decomposition of D:

41 Slide credit: Martial Hebert

(42)

Factorizing the Measurement

Matrix

• Obtaining a factorization from SVD:

42 Slide credit: Martial Hebert

(43)

Factorizing the Measurement

Matrix

• Obtaining a factorization from SVD:

43 Slide credit: Martial Hebert

This decomposition minimizes

(44)

Affine Ambiguity

• The decomposition is not unique. We get the same D

by using any 3×3 matrix C and applying the

transformations M → MC, S →C

-1

S.

• That is because we have only an affine transformation

and we have not enforced any Euclidean constraints

(like forcing the image axis to be perpendicular, for

example). We need a

Euclidean upgrade

.

B. Leibe

(45)

Estimating the Euclidean

Upgrade

• Orthographic assumption: image axes are

perpendicular and scale is 1.

• This can be converted into a system of 3

m

equations:

B. Leibe

x

X

a

₁

a

₂

a

₁

· a

₂

= 0

|a

₁

|

= |a

|

= 1

Slide adapted from S. Lazebnik, M. Hebert

1 2 1 2

1 1 1

2 2 2

ˆ

0 ˆ

1 1 ,

1,...,

ˆ

1

T T i i i i

T T

i i i

T T

i i i

a a

a CC a

a

a CC a

i

m

a

a CC a

�



� �

�

_

_�

_

�

_

�

_

(46)

Estimating the Euclidean

Upgrade

• This can be converted into a system of 3

m

equations:

• Let

• Then this translates to

3m

equations in L

 Solve for L

 Recover C from L by Cholesky decomposition: L = CC_T  Update M and S: M = MC, S = C_-1S

B. Leibe

Slide adapted from S. Lazebnik, M. Hebert

1 2 1 2

1 1 1

2 2 2

ˆ

0 ˆ

1 1 ,

1,...,

ˆ

1

T T i i i i

T T

i i i

T T

i i i

a a

a CC a

a

a CC a

i

m

a

a CC a

�



� �

�

_

_�

_

�

_

�

_

�

1 2

,

1,...,

i i _T

a

A

i

m

a

� �



_{� �}



� �

,

1,...,

T

i

A LA



I

i



m

(47)

Algorithm Summary

• Given:

m

images and

n

features x

• For each image

i, c

enter the feature coordinates.

• Construct a 2

m

×

n

measurement matrix D:



Column

j

contains the projection of point

j

in all views



Row

i

contains one coordinate of the projections of all the

n

points in image

i

• Factorize D:



Compute SVD: D = U W V

_T 

Create U

by taking the first 3 columns of U



Create V

by taking the first 3 columns of V



Create W

by taking the upper left 3 × 3 block of

W

• Create the motion and shape matrices:



M = U

W

3½

and S = W

3½

V

(or M = U

and S = W

V

)

• Eliminate affine ambiguity

47 Slide credit: Martial Hebert

(48)

Reconstruction Results

B. Leibe

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

(49)

Dealing with Missing Data

• So far, we have assumed that all points are

visible in all views

• In reality, the measurement matrix typically

looks something like this:

B. Leibe

Cameras

Points

(50)

Dealing with Missing Data

• Possible solution: decompose matrix into

dense sub-blocks, factorize each sub-block,

and fuse the results



Finding dense maximal sub-blocks of the matrix is

NP-complete (equivalent to finding maximal cliques

in a graph)

• Incremental bilinear refinement

(1) Perform

factorization on

a dense

sub-block

F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting,

Modeling, and Matching Video Clips Containing Multiple Moving

Objects. PAMI 2007.

(51)

Dealing with Missing Data

• Possible solution: decompose matrix into

dense sub-blocks, factorize each sub-block,

and fuse the results



Finding dense maximal sub-blocks of the matrix is

NP-complete (equivalent to finding maximal cliques

in a graph)

• Incremental bilinear refinement

(1) Perform

factorization on

a dense

sub-block

(2) Solve for a new 3D

point visible by at

least two known

cameras (linear

least squares)

F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting,

Modeling, and Matching Video Clips Containing Multiple Moving

Objects. PAMI 2007.

(1)

Topics of This Lecture

• Structure from Motion (SfM)



Motivation



Ambiguity

• Affine SfM



Affine cameras



Affine factorization



Euclidean upgrade



Dealing with missing data

• Projective SfM



Two-camera case



Projective factorization



Bundle adjustment



Practical considerations

• Applications

(2)

Commercial Software

Packages

• boujou

(http://www.2d3.com/)

• PFTrack

(http://www.thepixelfarm.co.uk/)

• MatchMover

(http://www.realviz.com/)

• SynthEyes

(http://www.ssontech.com/)

• Icarus

(http://aig.cs.man.ac.uk/research/reveal/icarus/

)

• Voodoo Camera Tracker

(http://www.digilab.uni-hannover.de/)

(3)

boujou demo

(We have a license available, so if you want

to try it for interesting projects, contact us.)

(4)

Applications: Matchmoving

B. Leibe

• Putting virtual objects into real-world videos

_{Original sequence Tracked features}

SfM results

Final video

(5)

Another Example: The Campanile

Movie

Video from SIGGRAPH’97 Animation Theatre http://www.debevec.org/Campanile/#movie

(6)

References and Further

Reading

• A (relatively short) treatment of affine and

projective SfM and the basic ideas and

algorithms can be found in Chapters 12 and 13

of

• More detailed information (if you really

want to implement this) and better

explanations can be found in Chapters 10,

18 (factorization) and 19 (self-calibration)

of

B. Leibe 75

D. Forsyth, J. Ponce,

Computer Vision – A Modern Approach. Prentice Hall, 2003

R. Hartley, A. Zisserman

Multiple View Geometry in Computer Vision 2nd Ed., Cambridge Univ. Press, 2004