Monday, March 31, 2008

Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes (Wobbrock 2007)

Summary:
The $1 recognizer provides a solution to interactive recognition for use in human-computer interfaces that does not require in-depth algorithmic knowledge and intense mathematics to be usable. The $1 recognizer algorithm assumes an input of a sequence of points, outputs a list of most likely template matches, and can be broken down into four steps. First, the input point path is resampled so that the path is represented by N points that are equally spaced apart. Secondly, the points are rotated so that the vector from the centroid of all points in the gesture to the first point is at an angle of zero degrees. Next, the gesture points are scaled to a square and translated so that the centroid lies at the origin. Finally, the candidate points are compared to each template to determine which is the most likely match. The comparison is made between corresponding pairs of points in the candidate and template gestures by examining their distance apart, only after ensuring the candidate gesture is optimally aligned with the template gesture by performing a Golden Section Search to find the best rotational adjustment to the previously calculated indicative angle.
Ten users performed a total of 4800 gestures to compare the $1 recognizer to Rubine and Dynamic Time Warping (DTW) methods. $1 and DTW were significantly more accurate than Rubine, with recognition rates above 99%, compared to Rubine's 92%. The number of training examples and speed of gesture articulation did not produce significant effects across the three recognizers. DTW took considerably longer to run than either $1 or Rubine. Interestingly, $1 performed only 0.23% worse when the Golden Section Search portion of the algorithm was removed.

Discussion:
I think the recognition rates achieved by $1 are commendable, especially considering its ease of implementation. The simplicity of the concepts involved in its implementation are another plus for the $1 system. The algorithm presented in the appendix is the most clear and complete of any other description for implementing a gesture recognition system that I have encountered this semester. I see how the position data from an instrumented glove could be projected into two dimensions and used as input points for this system.
As presented in the paper, the $1 recognizer cannot distinguish gestures that depend on orientation. By recording the amount of rotation during the "indicative angle" rotation and the rotational optimization adjustment, the original orientation could be determined and used in the recognition process. For example, if the total rotation for each template gesture was recorded, and the total rotation for each candidate gesture was calculated, the values could be compared, and templates whose orientations were not within some tolerance level of the candidate could be eliminated.
The paper raises the question of whether the first point in a gesture is the best to use for finding the indicative angle. A possible alternative to the first point could be the centroid of the first n points. Experimentation could be done to find a suitable number for n, which would probably be related to N (possibly n = floor(N/10) or something similar?).

1 comment:

Paul Taele said...

I know that some of the other people in the class don't like $1 as much over, say, another algorithm like Rubine. While it has some limitations that doesn't exactly make me think that it's superior to Rubine or even template-matching as well, I do like $1 for its sheer simplicity and its potential for being more accessible. Concerning your comment on the orientation issue, someone posted a common on Aaron's blog concerning that:

One thing you might like to know is that for rotation invariance, you don't have to either keep it on or turn it off. You could simply flag those template gestures that should be tested as rotation-specific, and leave the flag off for those that should be rotation invariant. It doens't have to be all or nothing, and this addition is trivial to make. You can also easily bound the range of rotation invariance you want (e.g., "mostly upright" from 80 degrees to 100 degrees, or whatever). So there's a lot of flexibility there.

It's another alternative solution that's worth investigating.