I\'m working on a project that uses the Kinect and OpenCV to export fintertip coordinates to Flash for use in games and other programs. Currently, our setup works based on
The depth stream is correct. You should indeed take the depth value, and then from the Kinect sensor, you can easily locate the point in the real world relative to the Kinect. This is done by simple trigonometry, however you must keep in mind that the depth value is the distance from the Kinect "eye" to the point measured, so it is a diagonal of a cuboid.
Actually, follow this link How to get real world coordinates (x, y, z) from a distinct object using a Kinect
It's no use rewriting, there you have the right answer.