Finger/Hand Gesture Recognition using Kinect

后端 未结 8 975
刺人心
刺人心 2020-12-13 16:49

Let me explain my need before I explain the problem. I am looking forward for a hand controlled application. Navigation using palm and clicks using grab/fist.

Curren

相关标签:
8条回答
  • 2020-12-13 17:17

    The fast answer is: Yes, you can train your own gesture detector using depth data. It is really easy, but it depends on the type of the gesture.

    Suppose you want to detect a hand movement:

    1. Detect the hand position (x,y,x). Using OpenNi is straighforward as you have one node for the hand
    2. Execute the gesture and collect ALL the positions of the hand during the gesture.
    3. With the list of positions train a HMM. For example you can use Matlab, C, or Python.
    4. For your own gestures, you can test the model and detect the gestures.

    Here you can find a nice tutorial and code (in Matlab). The code (test.m is pretty easy to follow). Here is an snipet:

    %Load collected data
    training = get_xyz_data('data/train',train_gesture);
    testing = get_xyz_data('data/test',test_gesture); 
    
    %Get clusters
    [centroids N] = get_point_centroids(training,N,D);
    ATrainBinned = get_point_clusters(training,centroids,D);
    ATestBinned = get_point_clusters(testing,centroids,D);
    
    % Set priors:
    pP = prior_transition_matrix(M,LR);
    
    % Train the model:
    cyc = 50;
    [E,P,Pi,LL] = dhmm_numeric(ATrainBinned,pP,[1:N]',M,cyc,.00001);
    

    Dealing with fingers is pretty much the same, but instead of detecting the hand you need to detect de fingers. As Kinect doesn't have finger points, you need to use a specific code to detect them (using segmentation or contour tracking). Some examples using OpenCV can be found here and here, but the most promising one is the ROS library that have a finger node (see example here).

    0 讨论(0)
  • 2020-12-13 17:22

    It seems that you are unaware of the Point Cloud Library (PCL). It is an open-source library dedicated to the processing of point clouds and RGB-D data, which is based on OpenNI for the low-level operations and which provides a lot of high-level algorithm, for instance to perform registration, segmentation and also recognition.

    A very interesting algorithm for shape/object recognition in general is called implicit shape model. In order to detect a global object (such as a car, or an open hand), the idea is first to detect possible parts of it (e.g. wheels, trunk, etc, or fingers, palm, wrist etc) using a local feature detector, and then to infer the position of the global object by considering the density and the relative position of its parts. For instance, if I can detect five fingers, a palm and a wrist in a given neighborhood, there's a good chance that I am in fact looking at a hand, however, if I only detect one finger and a wrist somewhere, it could be a pair of false detections. The academic research article on this implicit shape model algorithm can be found here.

    In PCL, there is a couple of tutorials dedicated to the topic of shape recognition, and luckily, one of them covers the implicit shape model, which has been implemented in PCL. I never tested this implementation, but from what I could read in the tutorial, you can specify your own point clouds for the training of the classifier.

    That being said, you did not mentioned it explicitly in your question, but since your goal is to program a hand-controlled application, you might in fact be interested in a real-time shape detection algorithm. You would have to test the speed of the implicit shape model provided in PCL, but I think this approach is better suited to offline shape recognition.

    If you do need real-time shape recognition, I think you should first use a hand/arm tracking algorithm (which are usually faster than full detection) in order to know where to look in the images, instead of trying to perform a full shape detection at each frame of your RGB-D stream. You could for instance track the hand location by segmenting the depthmap (e.g. using an appropriate threshold on the depth) and then detecting the extermities.

    Then, once you approximately know where the hand is, it should be easier to decide whether the hand is making one gesture relevant to your application. I am not sure what you exactly mean by fist/grab gestures, but I suggest that you define and use some app-controlling gestures which are easy and quick to distinguish from one another.

    Hope this helps.

    0 讨论(0)
提交回复
热议问题