gaze estimation from an image of an eye

问题

I've been able to detect pupil and the eye corners accurately so far. You can see a few snaps i uploaded in my answer to my own question here:

Performing stable eye corner detection

Here's what i've done so far. I calibrated the gaze of the user by looking at TLCP, TRCP and BLCP where

CP = calibration point; a screen point used for calibration
B = bottom
T = top
L= left
R = right
gaze_width = TRCP.x - TLCP.x

gaze_height = BLCP.y- TLCP.y

And the corresponding gaze points i get by looking at those CPs are called GPs

calculation of a gaze point GP:

I subtract the TLGP's ordinates' values from the current pupil center's location, because the gaze point has to fall in the hypothetical rectangle whose i hope you understand it, its really very simple.

I've linearly mapped the gaze points calculated from pupil center's location to screen points using a basic scaling system where the scales are calculated as follows:

scaleX = screen_width/gaze_width
scaleY = screen_height/gaze_height

And for any gaze point P(x,y) i calculate the corresponding screen point Q(m,n) as:

m = scaleX*x
n = scaleY*y

But the problem is, after even almost perfect pupil detection (almost because in poor lighting it gives false positives. But i intend to put that under limitations because i can't work on it, i don't have enough time), i'm still getting poor gaze width and gaze height.

Here's a test run log:

DO_CAL= True

Gaze Parameters:

TLGP = (38, 26) | TRGP = (20, 22) | BLGP = (39, 33)
screen height = 768 screen width = 1366

gaze height = 7 gaze width = 18

scales: X = 75.8888888889 | Y = 109.714285714
Thing on = True

Gaze point = (5, 3)
Screen point: (987, 329)

Gaze point = (5, 3)
Screen point: (987, 329)

Gaze point = (7, 5)
Screen point: (835, 549)

Thing on = False

TLGP = (37, 24) | TRGP = (22, 22) | BLGP = (35, 29)
screen height = 768 screen width = 1366

gaze height = 5 gaze width = 15
scales: X = 91.0666666667 | Y = 153.6
Thing on = True

Gaze point = (12, 3)
Screen point: (1093, 461)

Gaze point = (12, 3)
Screen point: (1093, 461)

ESC pressed

Just look at the gaze points and their corresponding gaze-detected screen points (under them). The vast differences in x,y ordinates' values is bugging me nuts. Monday is the final presentation.

After this approach, i theorized another one where in:

Calibration is done as in the first method. I would detect the motion of the gaze, and its direction. Say, given any two points of pupil center’s location, P and Q, where P is the first gaze point, Q is the second, then we calculate the direction and length of the line PQ.

Let’s assume that the length of this line segment is L. We then scale L to screen proportions, say L is D in screen scale, and given the direction of gaze movement, we move the cursor on the screen from its last point of rest, say R, D distance, to a new point S which will be calculated as the end point of the line segment whose length is D, and starting point S. The figurative representation is given in the figure. Thus basically, i don't map any gaze data to screen point, i basically track the gaze, and convert it into a "push" to be applied to the cursor on the screen. But i haven't implemented it yet. Because it actually doesn't map the gaze to screen co-ordinates, and thus might be erroneous. Motivations for this theory were derived from the eViacam project on sourceforge - they basically track your face, and move the mouse accordingly. In calibration they just calculate how much your face moves along the axes.

Bottom line: So if any of you have any ideas of how to detect a user's gaze from a perfectly processed eye image - one with a detected pupil center and eye corners, please do tell! I've got just about a day, and i know its late, but i just need any magical idea that can help me.

回答1:

This is not an answer, but it is impossible to post as a comment. I'll delete it after your answer.

Are you sure you have all the necessary parameters?

Consider the following diagram:

If your camera detects the corners and pupil at {K, J, Q} , how can you distinguish from another triple {F, E, O}? Note that the measures are the same, but the gaze directions, represnted by the black arrows are completely different.

Note: the two black and the red lines were drawn from a single camera point, placed outside the visible region.

来源：https://stackoverflow.com/questions/10365087/gaze-estimation-from-an-image-of-an-eye

标签

image-processing

OpenCV

computer-vision

human-computer-interface

eye-tracking