To get a mapping between the real world and camera without any prior information of the camera you need to calibrate the camera...here you can find some theory
For calculating the depth i.e. distance between camera and object you need at least two images of the same object taken by two different cameras...which is popularly called the stereo vision technique..