Wednesday, January 30, 2008

Flexible Gesture Recognition for Immersive Virtual Environments (Deller 2006)

Summary:
The paper gives an overview of an inexpensive gesture recognition system which can be extended to recognize additional gestures and work with multiple users. The author argues that gestures are a natural form of interaction. When a person sees a human hand perform an action, the person immediately knows how to perform the action. There is much less mental translation involved with hand gestures than there is when a person sees a mouse pointer perform some action and tries to map the action into physical movement. The paper mentions some downsides of current approaches, focusing on fixed installations, expensive hardware, and requirement of high computational power -- all of which which combine to exclude the use of these solutions from ordinary working environments.
The solution described by the paper uses a P5 glove with an infrared based position and orientation tracker. The virtual environment is displayed in 2D on a higher resolution monitor and in stereoscopic 3D on a SeeReal C-I. Since the system learns postures by each user performing them, the system can easily adapt to multiple users. When determining what gesture is being performed, the main decision is made by analyzing the position of each of the fingers and the orientation of the hand. The relevance of hand orientation in identifying a gesture can be adjusted per gesture.

Discussion:
The author mentions interacting "in a natural way by just utilizing [one's] hands in ways [one] is already used to". This seems a bit vague. When interacting with physical objects, some people may prefer to slide them, others may prefer to pick them up and move them. These different methods of movement are intended to perform the same goal, and a choice must be made as to which gesture will be associated with an intended action in the program.
The gestures recognized by the example application seemed to be fairly limited and the associated actions seemed to lack innovation -- mimicking a mouse-based interface. For example, the "tapping" and "moving" gestures do not provide any more usefulness as gestures than the 2D clicking and dragging movements of a mouse. I doubt the user of such a gesture based system would gain any real productivity over a traditional interface.
The paper fails to mention quantitative results, serving as more of a proof-of-concept than a scientific study. The authors claim that the engine provides a fast and reliable gesture recognition interface on standard consumer computers, but fails to give specific data defining "reliable" and "standard consumer".

No comments: