proportions of a perspective-deformed rectangle

后端 未结 10 1628
萌比男神i
萌比男神i 2020-11-29 17:31

Given a 2d picture of a rectangle distorted by perspective:

\"enter

I know tha

10条回答
  •  渐次进展
    2020-11-29 17:59

    There seems to still be some confusion on this interesting problem. I want to give an easy-to-follow explanation for when the problem can and cannot be solved.

    Constraints and Degrees of Freedom

    Typically when we are faced with a problem like this the first thing to do is to assess the number of unknown Degrees of Freedom (DoFs) N, and the number of independent equations M that we have for constraining the unknown DoFs. It is impossible to solve the problem if N if exceeds M (meaning there are fewer constraints than unknowns). We can rule out all problems where this is the case as being unsolvable. If N does not exceed M then it may be possible to solve the problem with a unique solution, but this is not guaranteed (see the second to last paragraph for an example).

    Let's use p1, p2, p3 and p4 to denote the positions of the 4 corners of the planar surface in world coordinates. Let's use R and t to be the 3D rotation and translation that transforms these to camera coordinates. Let's use K to denote the 3x3  camera intrinsic matrix. We will ignore lens distortion for now. The 2D position of the ith corner in the camera's image is given by qi=f(K(Rpi+t)) where f is the projection function f(x,y,z)=(x/z,y/z). Using this equation we know that each corner in the image gives us two equations (i.e. two constraints) on our unknowns: one from the x component of qi and one from the y component. So we have a total of 8 constraints to work with. The official name of these constraints are the reprojection constraints.

    So what are our unknown DoFs? Certainly R and t are unknown, because we do not know the camera's pose in world coordinates. Therefore we have already 6 unknown DoFs: 3 for R (e.g. yaw, pitch and roll) and 3 for t.  Therefore there can be a maximal of two unknowns in the remaining terms (K, p1, p2, p3, p4). 

    Different problems

    We can construct different problems depending on which two terms in (K, p1, p2, p3, p4) we shall consider as unknown. At this point let's write out K in the usual form: K=(fx, 0, cx; 0, fy, cy; 0,0,1) where fx and fy are the focal length terms (fx/fy is normally called the image aspect ratio) and (cx,cy) is the principal point (the centre of projection in the image).

    We could obtain one problem by having fx and fy as our two unknowns, and assume (cx, cy, p1, p2, p3, p4) are all known. Indeed this very problem is used and solved within OpenCV's camera calibration method, using images of a checkerboard planar target. This is used to get an initial estimate for fx and fy, by assuming that the principal point is at the image centre (which is a very reasonable assumption for most cameras).

    Alternatively we can create a different problem by assuming fx=fy, which again is quite reasonable for many cameras, and assume this focal length (denoted as f) is the only unknown in K. Therefore we still have one unknowns left to play with (recall we can have a maximum of two unknowns). So let's use this by supposing we known the shape of the plane: as a rectangle (which was the original assumption in the question). Therefore we can define the corners as follows: p1=(0,0,0), p2=(0,w,0), p3=(h,0,0) and p4=(h,w,0), where h and w denotes the height and width of the rectangle. Now, because we only have 1 unknown left, let us set this as the plane's aspect ratio: x=w/h. Now the question is can we simultaneously recover x, f, R and t from the 8 reprojection constraints? The answer it turns out is yes! And the solution is given in Zhang's paper cited in the question.

    The scale ambiguity

    One might wonder if another problem can be solved: if we assume K is known and the 2 unknowns are h and w. Can they be solved from the reprojection equations? The answer is no, and is because there is an ambiguity between the size of the plane and the plane's depth to the camera. Specifically if we scale the corners pi by s and scale t by s, then s cancels in the reprojection equations. Therefore the absolute scale of the plane is not recoverable.

    There may be other problems with different combinations for the unknown DoFs, for example having R, t, one of the principal point components and the plane"s width as unknowns. However one needs to think of which cases are of practical use. Nevertheless I haven't yet seen a systematic set of solutions for all useful combinations!

    More points

    We might think that if we were to add extra point correspondences between the plane and the image, or exploit the edges of the plane, we could recover more than 8 unknown DoFs. Sadly the answer is no. This is because they don't add any extra independent constraints. The reason is because the 4 corners describe completely the transform from the plane to the image. This can be seen by fitting a homography matrix using the four corners, which can then determine the positions of all other points on the plane in the image.

提交回复
热议问题