Thursday, April 24, 2008

Toward Natural Gesture/Speech HCI: A Case Study of Weather Narration Poddar 1998)

Summary:
This paper examined gestures in the natural domain of weather newscasting. The authors describe a natural human computer interface as including a gesture recognition module, a speech recognition module, and a display for providing audio and visual feedback.
An HMM-based gesture recognition system was used to analyze video of five weather persons. The features used in the gesture recognition were extracted from the video using kalman filtering and color segmentation. Specifically, the distance, radial and angular velocities of each hand with respect to the head were used to describe hand motion. Gestures were classified into three main categories: pointing, area, and contour. Gestures were also separated into phases of preparation, retraction, and actual stroke. This led to the choice of using left-to-right causal models with 3 states for the preparation, retraction, and point HMMs and 4 states for the contour and rest HMMs.
A study was done to determine the co-occurrence of spoken words with specific gestures. When the results of the co-occurrence analysis were applied to the data, recognition rates were higher for three of the four video sequences examined. However, recognition rates remained somewhat low overall, the highest was 75% accuracy.

Discussion:
Since the data comes from a weather newscasting environment, the background filtering has the potential to be much simpler. Instead of using the video input of the composited image with the weather map displayed in the background, the raw video feed of the newscaster in front of the blue or green screen could be used. The color-based filtering algorithm would have a much easier job since the static, singly colored background can be easily filtered out.
The paper mentions a probability that could be interpreted as the weather person's handedness. I don't think handedness would affect the hand used for a gesture as much as where the weather person happened to be standing in relation to the portion of the map being discussed at the time of the gesture or which hand held the clicker that advances the background image to the next video feed.

No comments: