Optical 3D gesture recognition is a paradigm shift in human-computer interaction (HCI). Simple gesture recognition is nothing new: keyboards, mice, and remote controls use switches, location sensors, and accelerometers to recognize human gestures and turn them into computer commands. These devices feed multiple types of data from different types of hardware to a computer.
However, optical 3D gesture recognition uses only light to understand what a human wants. In this case, less is considerably more. Soon, gesture recognition will be a ubiquitous and omnipresent tool in our everyday life in ways we can barely imagine. The reason will be, in large part, because of its simplicity. Here, we introduce how optical 3D gesture recognition works and then explore how new applications may develop and proliferate.
How Gesture Recognition Works
Not surprisingly, the first generation of gesture-recognition systems work much like human 3D recognition in nature. A light source such as the sun bathes an object in a full spectrum of light. The eye senses reflected light, but only a limited portion of the spectrum. The brain compares a series of these reflections and computes movement and relative location.
Imagine a computer game player swinging a golf club in front of a gesture-recognition device. A light source in the device illuminates the player and surrounding area with invisible, near-infrared light. The light bounces off the player and reflects back to the device. Optical filters screen out spurious and ambient light, letting only the near-infrared spectra through to the light sensor. Interpreting differences in the light bouncing back from different parts of the play, firmware creates an electronic 3D map of the player and sends it to the computer game. The result is a very realistic gaming experience—without wires to trip over or controllers to send flying.
Basic Gesture-Recognition System Components
Despite the number of different technologies that support gesture-recognition systems, they all share a basic component list, and JDSU is a major supplier to two of the categories:
- Light Source — an LED or laser diode typically generating infrared or near-infrared light. This light isn’t normally noticeable to users and is often optically modulated to improve the resolution performance of the system. JDSU is a major supplier of light sources.
- Controlling Optics — optical lenses help optimally illuminate the environment and focus reflected light onto the detector surface. A bandpass filter lets only reflected light that matches the illuminating light frequency reach the light sensor, eliminating ambient and other stray light that would degrade performance. JDSU is a major supplier of controlling optics.
- Image sensor — a high performance optical receiver detects the reflected, filtered light and turns it into an electrical signal for processing by the firmware.
- Firmware — very-high-speed ASIC or DSP chips process the received information and turn it into a format which can be understood by the end-user application such as video game software.
This has been an important gating item in the proliferation of gesture-recognition devices. Due to their inherent spectral precision and efficiency, diode lasers are a preferred option, particularly for high-volume consumer electronic applications. These applications are characterized by a limited source of electrical power and a high density of components, factors that drive a need to minimize dissipated thermal power.
The lasers often work with other, wavelength-sensitive optical components such as filters and detectors that require tight wavelength control over a wide temperature range. And, for high data-rate systems such as gesture recognition, these lasers must operate with very low failure rates and with minimal degradation over time.
The gating issue with these light sources has been manufacturability. With any high-volume, low-margin consumer electronic component, the component cost profile can make the difference between an inventor’s fantasy and a revenue-generating dream come true. Job shops and boutique contract manufacturers can effectively produce high-performance proof-of-concept or prototype parts in very low volumes. However, producing the parts in the millions to very tight performance standards is a completely different matter, and the margins on consumer electronics are extremely thin. Globally, the number of qualified manufacturers for high-volume gesture-recognition components is surprisingly low.
With decades of experience manufacturing telecom-grade optical components, JDSU is one of the world’s premier suppliers of high-volume, high-quality light sources such as those used in optical 3D gesture-recognition systems.
Optical filters are sophisticated components in controlling optics for gesture recognition. Typically these are narrow bandpass near-infrared filters with very low signal-to-noise ratios in the desired band and thorough blocking elsewhere. Limiting the light that gets to the sensor eliminates unnecessary data unrelated to the gesture-recognition task at hand. This dramatically reduces the processing load on the firmware. As it is, noise-suppressing functionality is typically already coded into the application software.
As with the light sources, high-volume manufacturability is a complicating factor. Most filter producers are niche providers, and the ability to produce precision at scale is rare. JDSU optical coatings enabled the 3D effects used in the movie Avatar, and JDSU optical filters and coatings play equally important roles in enabling 3D optical gesture-recognition systems.
The image sensors used for gesture recognition are typically CMOS or CCD chips similar to those used in cell phones. There are clear differences in how these two types of technologies capture images. However, as the technologies mature, their capabilities and applicability are merging. Each can be used for high-quality imaging and each can be manufactured in high volumes with good quality at relatively low price points. Unlike the situation with light sources and optical filters, a number of manufacturers have the ability to effectively meet demand for these components.
In current applications, processing gesture-recognition data is a joint task of embedded firmware in the gesture-recognition device and software in the gaming console or computer. There is little to say about this aspect of gesture recognition because, quite simply, it’s a secret. It is a highly proprietary element of gesture recognition, and developers maintain a considerable portion of their equity in the gesture-recognition algorithms. The important takeaway here is a paradox: gesture-recognition devices are extraordinarily sophisticated in many respects, particularly in the manufacturing techniques used to build them. At the same time, gesture-recognition devices are comparatively simple and low cost because of the sophistication of their design along with the availability of singular manufacturing expertise. There are no moving parts in a gesture-recognition device, and the components are all technologically mature. Gesture-recognition devices will rapidly get smaller, less expensive, and even more capable than they are today.
The Role of Gesture Recognition in Natural User Interfaces
The essence of a natural user interface is invisibility: a user interacts with technology in a way that hides the number of steps between intent and result. For example, to turn on a light, a complex interface means walking to a wall, finding a switch, and flipping a toggle. A natural interface means merely saying “light” or flicking one’s hand from anywhere in a room. To play an interactive video sports game, a complex interface means using a joystick with lots of buttons. A natural interface means having the game understand body and hand gestures directly.
Letting the Machine Use Initiative
We tend to think of a user interface in one direction: we give commands for a machine to perform. The advantage of gesture recognition as part of a natural user interface is that it is a powerful way to let machines respond to needs before they are expressed and, indeed, without their ever needing to be converted into commands. The HCI works in two directions and the machine can initiate an action independently, without prompting from a human.
A physical-fitness video game enabled with gesture recognition could recognize that a user was using their left arm less than their right arm. The game could then cue the user to address the imbalance, or it could provide a specific set of game conditions that would naturally balance the arm usage. All this would take place without the user’s knowing: the machine is proactively interacting with the human in an invisible way.
Gesture recognition will most likely add functionality to existing interfaces rather than eliminate them. An effective user interface provides many different ways to accomplish the same task. For consumer applications, this is sometimes problematic; some users would prefer a smaller feature set with a simpler UI to a richer feature set with a more complex UI. Most mission-critical applications, however, need to operate under diverse working conditions and this argues for robust command systems.
For example, a home alarm system typically uses a keyboard as an authentication device to screen entry. If you know the code, you’re in. A gesture-recognition system can recognize your face, your gait, and even your clothes and let you in automatically. But rather than eliminate the keyboard and just use gesture recognition, it makes more sense to combine the two devices. With gesture recognition as an add-on to the keyboard, the door will unlock automatically as you approach the door; and, if you’ve given the door code to a friend, the door will unlock after he or she enters the code on the keyboard.
Similarly, gesture recognition alone can make for a great gaming experience, but adding additional capabilities with physical devices can heighten the experience exponentially. Imagine shooting a ray gun by pointing your finger. Now imagine adding a physical sensor that adjusts the firing rate based on the tightness of your finger pressure against your fist. The natural user interface does not preclude complexity: adding gesture recognition simply magnifies the scope and richness of HCI.
We’ve postulated a few different developments that we think will characterize future HCI:
- gesture-recognition devices will be everywhere: thanks to experienced, high-quality suppliers such as JDSU, the devices can be manufactured in high volumes at low costs.
- gesture recognition will let machines recognize human needs before they are articulated as commands: it will enable greater two-way HCI.
- gesture recognition will be a ubiquitous add-on capability: integrated with other technologies, it will enhance rather than replace user-interface components.
Put simply, gesture recognition-enabled natural user interfaces will be part of virtually all HCI, and quite soon. The technologies are sufficiently mature and the devices themselves are inexpensive. Their first mass-market implementation is in the Microsoft® Kinect™, a consumer gaming application, with over eight million units sold within 60 days of its introduction1. Microsoft released a software development kit in March 2011 to promote Kinect capabilities, and one can safely bet that a good number of the initial applications will have nothing to do with gaming.
Practical, High-Volume Applications
Microsoft itself released a video2 of its LightSpace gesture-recognition research team demonstrating “meeting room” type applications. Imagine a corporate meeting room. On the ceiling, gesture-recognition devices monitor movement anywhere in the room. A motorized, flexible projector displays the equivalent of a computer screen anywhere in the room: on a wall, on a desktop, even on someone’s hand.
Now imagine that the meeting leader points to a wall, flexes a hand, and a menu appears on the wall. She points to a menu item and a diagram appears on the wall. She then makes a grabbing motion with her hand in front of the wall, points to the table, and the diagram appears on the table. She zooms in on a design area using the same finger motions she’d use to look at pictures on her smartphone. Her colleague then makes a grabbing motion above the design and taps his laptop, copying the file.
It’s Happening Now
This isn’t science fiction, it is happening in research labs today. Now imagine adding voice recognition capabilities to the mix. And image sensitivity that can track the subtlest of eye movements, anywhere in a room. Gesture recognition will have paradigm-shifting effects on virtually all HCI, and its ultimate impact is quite beyond most people’s imaginations.
The JDSU Communications and Commercial Optical Products (CCOP) business segment supplies every major network equipment manufacturer with the optical products and solutions necessary to deploy new communications networks or maintain and upgrade existing communication networks. The group also provides solid-state, diode, direct-diode, and gas lasers for a broad range of applications including optical 3D gesture recognition. The JDSU Advanced Optical Technologies business segment leverages its core technology strengths of optics and materials science to manage light and/or color effects. With over six decades of experience in optical-coating technology, AOT develops innovative solutions that meet the needs of a variety of markets, from holograms and optical 3D gesture recognition to space exploration.
1 Consumer Electronics Show kick-off speech, Steve Ballmer, April 5, 2011.