Integrating gesture recognition and speech recognition in a touch-less human computer interaction system
MetadataShow full item record
We envision a command and control scenario where the speaker makes hand gestures while referring to objects on a computer display (monitor or projection screen). The objective is automated recognition of the gestures to manipulate the virtual objects on the display. Our approach advances the state of the art in human computer interaction technology in the following unique ways: (i) the user is expected to be at a distance from the display thus the sensing is "touchless" and is entirely based upon one or more camera unobtrusively placed in the environment, (ii) the gestures considered are predominantly two-handed, which are natural and intuitive as if speaking with the screen serving as a prop, and (iii) coherent multimodal integration of speech and gestures. We have trained HMMs and HCRF models using features such as PCA and Optical Flow. A pose estimation algorithm has been designed to identify the object of interest. It involves locating the hand using skin region modeling for each user in real time. The speech and gesture recognition modules provide independent outputs, which are integrated to execute the user commands. Experimental results on video sequences obtained from 11 different users providing five gesture classes are discussed.