I have been trying to work with the Projector and Ray classes in order to do some collision detection demos. I have started just trying to use the mouse to select objects or
Projector.unprojectVector() treats the vec3 as a position. During the process the vector gets translated, hence we use .sub(camera.position) on it. Plus we need to normalize it after after this operation.
I will add some graphics to this post but for now I can describe the geometry of the operation.
We can think of the camera as a pyramid in terms of geometry. We in fact define it with 6 panes - left, right, top, bottom, near and far (near being the plane closest to the tip).
If we were standing in some 3d and observing these operations, we would see this pyramid in an arbitrary position with an arbitrary rotation in space. Lets say that this pyramid's origin is at it's tip, and it's negative z axis runs towards the bottom.
Whatever ends up being contained within those 6 planes will end up being rendered on our screen if we apply the correct sequence of matrix transformations. Which i opengl go something like this:
NDC_or_homogenous_coordinates = projectionMatrix * viewMatrix * modelMatrix * position.xyzw;
This takes our mesh from it's object space into world space, into camera space and finally it projects it does the perspective projection matrix which essentially puts everything into a small cube (NDC with ranges from -1 to 1).
Object space can be a neat set of xyz coordinates in which you generate something procedurally or say, a 3d model, that an artist modeled using symmetry and thus neatly sits aligned with the coordinate space, as opposed to an architectural model obtained from say something like REVIT or AutoCAD.
An objectMatrix could happen in between the model matrix and the view matrix, but this is usually taken care of ahead of time. Say, flipping y and z, or bringing a model thats far away from the origin into bounds, converting units etc.
If we think of our flat 2d screen as if it had depth, it could be described the same way as the NDC cube, albeit, slightly distorted. This is why we supply the aspect ratio to the camera. If we imagine a square the size of our screen height, the remainder is the aspect ratio that we need to scale our x coordinates.
Now back to 3d space.
We're standing in a 3d scene and we see the pyramid. If we cut everything around the pyramid, and then take the pyramid along with the part of the scene contained in it and put it's tip at 0,0,0, and point the bottom towards the -z axis we will end up here:
viewMatrix * modelMatrix * position.xyzw
Multiplying this by the projection matrix will be the same as if we took the tip, and started pulling it appart in the x and y axis creating a square out of that one point, and turning the pyramid into a box.
In this process the box gets scaled to -1 and 1 and we get our perspective projection and we end up here:
projectionMatrix * viewMatrix * modelMatrix * position.xyzw;
In this space, we have control over a 2 dimensional mouse event. Since it's on our screen, we know that it's two dimensional, and that it's somewhere within the NDC cube. If it's two dimensional, we can say that we know X and Y but not the Z, hence the need for ray casting.
So when we cast a ray, we are basically sending a line through the cube, perpendicular to one of it's sides.
Now we need to figure out if that ray hits something in the scene, and in order to do that we need to transform the ray from this cube, into some space suitable for computation. We want the ray in world space.
Ray is an infinite line in space. It's different from a vector because it has a direction, and it must pass through a point in space. And indeed this is how the Raycaster takes its arguments.
So if we squeeze the top of the box along with the line, back into the pyramid, the line will originate from the tip and run down and intersect the bottom of the pyramid somewhere between -- mouse.x * farRange and -mouse.y * farRange.
(-1 and 1 at first, but view space is in world scale, just rotated and moved)
Since this is the default location of the camera so to speak (it's object space) if we apply it's own world matrix to the ray, we will transform it along with the camera.
Since the ray passes through 0,0,0, we only have it's direction and THREE.Vector3 has a method for transforming a direction:
THREE.Vector3.transformDirection()
It also normalizes the vector in the process.
The Z coordinate in the method above
This essentially works with any value, and acts the same because of the way the NDC cube works. The near plane and far plane are projected onto -1 and 1.
So when you say, shoot a ray at:
[ mouse.x | mouse.y | someZpositive ]
you send a line, through a point (mouse.x, mouse.y, 1) in the direction of (0,0,someZpositive)
If you relate this to the box/pyramid example, this point is at the bottom, and since the line originates from the camera it goes through that point as well.
BUT, in the NDC space, this point is stretched to infinity, and this line ends up being parallel with the left,top,right,bottom planes.
Unprojecting with the above method turns this into a position/point essentially. The far plane just gets mapped into world space, so our point sits somewhere at z=-1, between -camera aspect and + cameraAspect on X and -1 and 1 on y.
since it's a point, applying the cameras world matrix will not only rotate it but translate it as well. Hence the need to bring this back to the origin by subtracting the cameras position.