Wearable Smart Camera Can Detect Silent Voice Commands


Two researchers from Cornell University have created a wearable infrared smart camera that can detect voice commands even if the speaker isn’t making a sound by measuring the neck and face from under the chin.

The Usefulness of Silent Speech Detection

Cheng Zhang, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science, and Cornell University doctoral student Ruidong Zhang have developed the wearable camera and dubbed it “SpeeChin.” It is the first necklace-based silent-speech recognition device that can detect 54 English and 44 Chinese silent speech commands.

“Imagine when your hands are occupied or you simply don’t want to reach out to your smart devices to interact with them, you might want to use voice control,” Zhang says. “However, if you are in a noisy place or in a meeting, voice control is not efficient or socially appropriate. This is where silent speech comes into place.”

IR Smart Camera for silent speech detection

SpeeChin uses a neck-mounted infrared camera that can capture the movement of the chin from below, which allows it to determine what words are being spoken even if no sound is audible. The placement of the camera largely eliminates any privacy concerns and also allows it to be far more subtle than forward-mounted placement.

High Reliability in Limited Environments

Gizmodo reports that the two students tested SpeeChin with 20 participants: 10 spoke 54 simple phrases including numbers and common voice assistant commands in English and 10 spoke 44 simple words and phrases in Mandarin Chinese. Once the camera and logic were trained, it was able to recognize commands in English with 90.5% accuracy and Chinese with 91.6% accuracy.

IR Smart Camera for silent speech detection

These high marks were only attainable when those participants were sitting stationary, however. Once they were asked to move, the reliability of the recognition fell as the result of variations in walking gait and head movement. Unfortunately, this significantly reduces the number of places that the SpeeChin device is usable.

The requirement of a stationary wearer may be an issue with this iteration of the SpeeChin, but it would be theoretically possible to fix this with longer training with the logic or an improved infrared camera. The device used in this early iteration is very clearly still quite young, so seeing these changes in later iterations doesn’t seem out of the question.