I'm trying to reconstruct 3D points from 2D image correspondences. My camera is calibrated. The test images are of a checkered cube and correspondences are hand picked. Radial distortion is removed. After triangulation the construction seems to be wrong however. The X and Y values seem to be correct, but the Z values are about the same and do not differentiate along the cube. The 3D points look like as if the points were flattened along the Zaxis.
What is going wrong in the Z values? Do the points need to be normalized or changed from image coordinates at any point, say before the fundamental matrix is computed? (If this is too vague I can explain my general process or elaborate on parts)
Update
Given:
x1 = P1 * X
and x2 = P2 * X
x1
, x2
being the first and second image points and X
being the 3d point.
However, I have found that x1
is not close to the actual hand picked value but x2
is in fact close.
How I compute projection matrices:
P1 = [eye(3), zeros(3,1)];
P2 = K * [R, t];
Update II
Calibration results after optimization (with uncertainties)
% Focal Length: fc = [ 699.13458 701.11196 ] ± [ 1.05092 1.08272 ]
% Principal point: cc = [ 393.51797 304.05914 ] ± [ 1.61832 1.27604 ]
% Skew: alpha_c = [ 0.00180 ] ± [ 0.00042 ] => angle of pixel axes = 89.89661 ± 0.02379 degrees
% Distortion: kc = [ 0.05867 0.28214 0.00131 0.00244 0.35651 ] ± [ 0.01228 0.09805 0.00060 0.00083 0.22340 ]
% Pixel error: err = [ 0.19975 0.23023 ]
%
% Note: The numerical errors are approximately three times the standard
% deviations (for reference).

K =
699.1346 1.2584 393.5180
0 701.1120 304.0591
0 0 1.0000
E =
0.3692 0.8351 4.0017
0.3881 1.6743 6.5774
4.5508 6.3663 0.2764
R =
0.9852 0.0712 0.1561
0.0967 0.9820 0.1624
0.1417 0.1751 0.9743
t =
0.7942
0.5761
0.1935
P1 =
1 0 0 0
0 1 0 0
0 0 1 0
P2 =
633.1409 20.3941 492.3047 630.6410
24.6964 741.7198 182.3506 345.0670
0.1417 0.1751 0.9743 0.1935
C1 =
0
0
0
1
C2 =
0.6993
0.5883
0.4060
1.0000
% new points using cpselect
%x1
input_points =
422.7500 260.2500
384.2500 238.7500
339.7500 211.7500
298.7500 186.7500
452.7500 236.2500
412.2500 214.2500
368.7500 191.2500
329.7500 165.2500
482.7500 210.2500
443.2500 189.2500
402.2500 166.2500
362.7500 143.2500
510.7500 186.7500
466.7500 165.7500
425.7500 144.2500
392.2500 125.7500
403.2500 369.7500
367.7500 345.2500
330.2500 319.7500
296.2500 297.7500
406.7500 341.2500
365.7500 316.2500
331.2500 293.2500
295.2500 270.2500
414.2500 306.7500
370.2500 281.2500
333.2500 257.7500
296.7500 232.7500
434.7500 341.2500
441.7500 312.7500
446.2500 282.2500
462.7500 311.2500
466.7500 286.2500
475.2500 252.2500
481.7500 292.7500
490.2500 262.7500
498.2500 232.7500
%x2
base_points =
393.2500 311.7500
358.7500 282.7500
319.7500 249.2500
284.2500 216.2500
431.7500 285.2500
395.7500 256.2500
356.7500 223.7500
320.2500 194.2500
474.7500 254.7500
437.7500 226.2500
398.7500 197.2500
362.7500 168.7500
511.2500 227.7500
471.2500 196.7500
432.7500 169.7500
400.2500 145.7500
388.2500 404.2500
357.2500 373.2500
326.7500 343.2500
297.2500 318.7500
387.7500 381.7500
356.2500 351.7500
323.2500 321.7500
291.7500 292.7500
390.7500 352.7500
357.2500 323.2500
320.2500 291.2500
287.2500 258.7500
427.7500 376.7500
429.7500 351.7500
431.7500 324.2500
462.7500 345.7500
463.7500 325.2500
470.7500 295.2500
491.7500 325.2500
497.7500 298.2500
504.7500 270.2500
Update III
See answer for corrections. Answers computed above were using the wrong variables/values.
** Note all reference are to Multiple View Geometry in Computer Vision by Hartley and Zisserman.
OK, so there were a couple bugs:
When computing the essential matrix (p. 257259) the author mentions the correct R,t pair from the set of four R,t (Result 9.19) is the one where the 3D points lay in front of both cameras (Fig. 9.12, a) but doesn't mention how one computes this. By chance I was rereading chapter 6 and discovered that 6.2.3 (p.162) discusses depth of points and Result 6.1 is the equation needed to be applied to get the correct R and t.
In my implementation of the optimal triangulation method (Algorithm 12.1 (p.318)) in step 2 I had
T2^1' * F * T1^1
where I needed to have(T2^1)' * F * T1^1
. The former translates the 1.I wanted, and in the latter, to translate the inverted the T2 matrix (foiled again by MATLAB!).Finally, I wasn't computing P1 correctly, it should have been
P1 = K * [eye(3),zeros(3,1)];
. I forgot to multiple by the calibration matrix K.
Hope this helps future passerby's !
It may be that your points are in a degenerate configuration. Try to add a couple of points from the scene that don't belong to the cube and see how it goes.
More information required:
 What is t? The baseline might be too small for parallax.
 What is the disparity between
x1
andx2
?  Are you confident about the accuracy of the calibration (I'm assuming you used the Stereo part of the Bouguet Toolbox)?
 When you say the correspondences are handpicked, do you mean you selected the corresponding points on the image or did you use an interest point detector on the two images are then set the correspondences?
I'm sure we can resolve this problem :)