A 3-D image of a kitchen overlaid with point-cloud data is stitched together like a jigsaw puzzle to create a topographical mesh or “texture map,” a collection of triangles that contain coordinates.
Tools have evolved over time to help humans master our environment, overcome our limitations and accomplish more with less effort. Very soon, a new generation of 3-D imaging tools will make a splash, enabling us to share more information and accomplish more tasks. But a new twist suggests a revolution is afoot: these tools are quickly evolving to enable us to change our environment, to explore and experience new things in a very real way, whether the goal is to decorate a house or play a game.
The 3-D imaging market comprises techniques, cameras, sensors and other technologies that enable the capture of images with both visual information and distance information. For many decades, 3-D imaging technology was relegated to academic research labs. In more recent years, gaming consoles have been a prominent signpost of ever-more-interactive 3-D experiences, like Nintendo’s Wii with its handheld controller and sensors, and Microsoft’s Kinect with its sensitivity to recognize hand gestures alone. In 2011, Viztu Technologies (acquired by 3D Systems in 2012) released free online software that enabled substitution of expensive 3-D laser scanner equipment with images from a smartphone or digital camera, so that users could scan an object, create a 3-D model of it and send it to a 3-D printer.
Now, developers are hot on the trail of 3-D sensors and cameras that will enable interactive, controller-free experiences on phones, laptops and tablets. In just the past year or two, media giants like Apple, Sony, Intel, Google and Microsoft have joined the fray (see sidebar on facing page). Rather than focus on gaming, their goal is now to embed smaller, mobile 3-D technology into future mobile devices and wearable computers.
Google’s Project Tango aims to create a phone with a 3-D camera.
Two to Tango
To that end, Google is leading a collaboration called Project Tango to create a prototype mobile phone with 3-D capability. The five-inch Android phone will have a built-in 3-D sensor that can make more than a quarter million 3-D measurements per second. The preliminary phone is currently available as a professional development kit for projects in the areas of indoor navigation and mapping, as well as single- and multiplayer games that use physical space. According to Google’s Project Tango website, the company has “also set aside units for applications we haven’t thought of yet.” The unique and new ways that consumers may soon interact with their phones can, as yet, only be imagined.
3-D software developers at Matterport (Mountain View, Calif., U.S.A.) used their reconstruction software on the Project Tango phone to create a full-color map of the space around them with a few sweeps of the camera. The map could be used to capture dimensions of a room before furniture shopping—or, by waving the phone around an object, to build a 3-D model that could be directly uploaded and printed on a 3-D printer.
Although 3-D modeling is not new technology, the process is costly, complex and intensive. The files are massive and difficult to transfer, requiring a computer-aided design (CAD) package to manage them. The new 3-D modeling processes like Matterport’s are able to automate and streamline a days-long process down to an hour. What exactly enables this light-year jump ahead in 3-D technology?
3-D imaging approaches
Camera/sensor technology: Time of flight (TOF)
Partners/projects: Microsoft acquired in 2010/Kinect II
Camera/sensor technology: Structured light
Partners/projects: Flextronics Lab IX
Camera/sensor technology: Structured light/
Partners/projects: Google/Project Tango developer
Camera/sensor technology: TOF
Partners/projects: Infineon/3-D image sensor chips, Bluetechnix 3-D camera systems
Camera/sensor technology: Structured light
Partners/projects: Apple/unannounced (rumored iTV), previously Microsoft/Xbox Kinect 1 (rumored Fortaleza) and 3-D Systems
Camera/sensor technology: TOF/gesture middleware
Partners/projects: NVIDIA, Makerbot, Meta, Intel/RealSense, Sony, Texas Instruments and a tier one automotive supplier
Camera/sensor technology: Stereoscopic
Partners/projects: Takata (automotive), Intel
The secret, says Matt Bell, cofounder of Matterport, is in the very fast (30 frames per second), simultaneous capture of geometric and texture data. In many 3-D sensors, the output is an image of the scene layered with a distance measurement to each pixel in the image. This enables the image to contain information such as how far away each part of the couch, floor and wall is, albeit at a fairly low resolution.
“Where Matterport comes in,” says Bell, “is in using that raw 3-D input data to build 3-D models of the world. You can fly through a model of a house that was automatically generated from 3-D sensor data via Matterport software.”
The software works by creating so-called 3-D mesh data. Each image capture can be thought of as a 3-D jigsaw puzzle piece, says Bell. The software figures out how these thousands of pieces fit together and stitches them together into a giant point cloud of data. Typically the cloud of data points is too large to be rendered on a display or monitor in real time, so the software compresses it down to a 3-D mesh format to make it viewable on one screen. The triangles in the mesh that are associated with closer points in the image appear larger.
“The technology for rendering and displaying triangle mesh has been common in 3-D videos and movies for 25 years,” says Bell. “We’re taking advantage of that 3-D mesh heritage by turning this massive point cloud into something that 3-D vendors can use.”
A new paradigm
Familiar “street view” imaging enables 360-degree panoramas, but the view is constrained to a few places where the panoramas were captured, and have a limited navigation experience. The 3-D mesh modeling, in contrast, extends that navigation ability to anywhere the viewer wants to go, including 360 degrees around in any direction, up or down. A full 3-D model enables exterior views, such as looking down into the space through the roof (which can be made invisible) or up from underneath the floor.
Sony’s PS4 Just Dance game combines their stereo camera with 3-D gestureware and full-body middleware skeleton tracking from SoftKinetic
Because walls can be knocked down using such 3-D mesh data, and flooring texture can be selected and changed, one potential consumer use is home remodeling and shopping for home furnishings. The 3-D model enables users to quickly capture your living room and drop furniture, paintings or wall colors right in. Currently, says Bell, only 10 percent of furniture sales are online, because it’s so hard to visualize what a piece of furniture would look like in the home, and it’s expensive to make a mistake.
Social sharing is another use of 3-D capabilities. Instead of sharing 2-D pictures of a museum sculpture or your nice hotel room, users can share a 3-D version that others can explore. Gaming in virtual or augmented reality are obvious consumer applications: you could map your house in 3-D and interact within that space, overlaid with the 3-D space of the game. The sensor updates what’s in front of you in real time, so you won’t trip over a chair or break a vase.
Matterport, created in 2011 and funded by $10 million in venture capital, launched the first full platform professional-grade 3-D camera in March 2014 for $4,500 alongside its subscription-based cloud service and web player. Opportunities for professional use extend to real estate sales, vacation rentals, event venues, construction supervision, insurance claims adjustment, hospitality, crime scene visualization and other applications.
“Our plan is to have these 3-D models available everywhere,” says Bell. “The goal is to make it quick and easy for anyone to capture the world around them in low-cost 3-D and then share the results online and on mobile.”
3-D imaging techniques
3-D imaging techniques
The 3-D imaging cameras use three major techniques. Stereoscopic vision, much like the binocular vision of humans, is a passive technique that uses images from two or more cameras at different positions to extract depth information. Your brain and the data from these cameras identify features common to both scenes in different locations to triangulate the distance to them. Stereoscopic sensors typically retrieve two RGB images using ambient light. The technique may also incorporate sensors that detect a wider range of visible light or infrared light. As such, it has low power consumption but it requires bright conditions.
Structured-light sensor technology projects a narrow infrared pattern, like parallel stripes or an array of dots, onto a 3-D target area (infrared won’t interfere with color information). The pattern may be generated via infrared laser or LED source. A camera, which must be at least two inches away from the pattern projector, records the patterns displaced from different angles to triangulate and identify the 3-D coordinates of the target area. Either way, it uses existing digital camera technology, but requires power for the light source.
Time-of-flight (TOF) sensors work by sending out millions of infrared pulses per second and detecting the time it takes to bounce off an object. The main component is an active light source that floods the scene with a train of pulsed square waves. Like lidar, the light reflects off the target and is measured at the sensor. The resulting phase change of the square waves determines the time difference or “time of flight” of the light. The amount of phase shift varies with the distance to the object.
Says Tim Droz, VP and general manager of SoftKinetic (Brussels, Belgium), “Time-of-flight sensors obtain the phase data from each pixel without having to do a lot of post-processing work. This means very little lag time from the sensor.” SoftKinetic’s TOF sensor is a CMOS sensor with a differential pixel structure and a patented photon-capture technology called current-assisted photonic demodulation. Each pixel has two halves, gated by a clock. As the square waves reflect back to the two halves, any phase offset creates a differential in charge for each frame, which provides distance information. By more efficiently capturing and converting the photons to electrons while integrating, the sensor enables higher quantum efficiency.
Different Realities: Virtual vs. Augmented
When it comes to virtual and augmented reality, several companies are developing wearable headsets for viewing 3-D graphics. Although various versions of the virtual-reality experience have been around since the 90s, they’ve been challenged by a disorienting lag time, a steep price tag and issues with comfort and wearability.
“Oculus Rift” is a much-anticipated next-generation virtual-reality headset designed to provide a more realistic experience that could take virtual-reality gaming into the mainstream. Oculus VR (Irvine, Calif., U.S.A.), set for acquisition by Facebook in mid-2014, is currently distributing a second round of development kits for the Rift, which uses custom tracking technology to provide low-latency, 360-degree head tracking with six degrees of freedom. Combined with a 100-degree field of view (90 degrees horizontal), Rift enables users to have an immersive experience in real time. The headset offers an effective resolution of 960 x 1080 per eye and uses DVI/HDMI and USB inputs on a PC. Oculus Rift may go on sale in late 2014 or early 2015.
In March, Sony introduced its first virtual-reality headset at the Game Developer’s Conference in San Francisco, in direct competition with Oculus Rift. Sony has sold head-mounted displays since 2011 known as the Personal HD and 3D Viewer, but this is their first foray into virtual reality. These announcements mark the first step in what could be a revolution in virtual-reality gaming and entertainment.
Augmented-reality glasses leave your surrounding environment visible while layering on information, screens or graphics that enhance your experience. “Google Glass” is an example of an augmented-reality headset. Meta recently launched the “Meta.01 SpaceGlasses,” a wearable augmented-reality device featuring 3-D surround sound and a SoftKinetic TOF sensor capable of capturing gestures as close as 15 cm at 60 fps. SpaceGlasses feature active displays in the lenses that project 3-D augmented reality that enables users to, for example, play a game of virtual chess projected on a flat game board that isn’t actually there, using hand gestures to make a move.
Meta recently launched a successful Kickstarter campaign, offering early backers the first opportunity to use its customized hardware components and software developer kit to develop a wide range of applications. Meta.01 SpaceGlasses are expected to launch to consumers in 2014.
As an active light technology that generates its own light, TOF works well in dim conditions. But it requires higher power consumption than stereoscopic techniques. Reducing the power requirements are a challenge for TOF sensors; another challenge is cost.
“To successfully penetrate the commercial market,” Droz says, “the 3-D industry must solve these challenges and improve the sensors and their implementation into devices. Improving the optics and quantum efficiency will ultimately make it cheaper.”
Wearable-computing company Meta (Los Altos, Calif., U.S.A.) chose to incorporate SoftKinetic’s DepthSense 325 TOF sensors and gesture-control library into their augmented-reality “SpaceGlasses.” Meta’s SpaceGlasses are a slightly more immersive version of Google Glass and with a wider field of view at 40 degrees binocular versus Google’s at 14 degrees monocular. Like other 3-D devices, SpaceGlasses enable users to obtain a model of an object, modify it and send the edited version to a 3-D printer.
“SpaceGlasses use side projection to enable real-time accurate feedback from the environment around you,” explains Droz. “You can look at your desk, an object, your hands, and bring that into the projected Meta environment. A 2-D sensor in the glasses provides all the color data, coupled with the depth and functionality that the 3-D data enables.”
SoftKinetic is also working with a tier one automotive technology company to create an in-car infotainment control center for occupants. The system involves a camera mounted on the roof of the vehicle that can sense gestures of a hand to control stereo volume, answer the phone and adjust dials.
The future looks 3-D
By all indications, 3-D imaging is set to be on a phone, tablet or laptop near you very soon. When Apple acquired 3-D sensor firm PrimeSense (Tel Aviv, Israel) for $360 million in November 2013, the 3-D augmented-reality industry took notice. PrimeSense’s structured-light 3-D sensor technology was behind the original Microsoft Kinect project. Due to a post-acquisition media blackout, analysts can only speculate how Apple plans to use the 3-D technology. But before the Apple buyout, the company was testing its sensing technology in mobile devices.
At the 2014 Consumer Electronics Show in Las Vegas, Intel’s senior vice president for perceptual computing, Mooly Eden, revealed a thin, five-inch 3-D camera as part of Intel’s “RealSense” initiative. RealSense perceptual computing combines gesture, voice and facial recognition into a 3-D “sharing platform” for PCs and tablets. Intel also announced next-generation augmented-reality and wearable-computing projects. Companies like Lenovo, Dell, Acer, Asus, HP and Fujitsu are also working on integrating close-range 3-D augmented-reality technology into their laptops and mobile devices.
“It turns out gesture recognition isn’t the only driving force,” says Droz. “While it continues to be an important feature, other applications are emerging for an even greater impact on 3-D technologies. The higher goal is acquiring the real world around you, using augmented-reality or virtual-reality glasses to play games, scan rooms and access people.”
Valerie C. Coffey is a freelance science and technology writer and editor based in Acton, Mass., U.S.A.