Quadcopter drone with camera

Researchers at the University of Maryland have suggested a mathematical framework for connecting visual and motion data in machine vision—and have tested out the system on a quadcopter drone including an event camera (1) and an onboard computer (2). [Image: Mitrokhin et al., Sci. Robot. 4, eaaw6736 (2019)]

In the quest for efficient machine-vision systems for robotics, many engineers have gravitated toward so-called neuromorphic or event cameras—systems that function in ways analogous to human vision, by detecting changes in brightness rather than recording complete, frame-by-frame RGB images. But while these cameras are particularly adept at capturing continuous motion, it’s been unclear how to efficiently tie the camera data to information on the movements and actions of the robot itself, to create a single cognitive framework—so-called active perception—that can be leveraged by machine-learning architectures.

Computer scientists at the University of Maryland, USA, have now developed a mathematical scheme for creating such active perception using neuromorphic cameras, potentially boosting the devices’ usefulness in AI-driven systems (Sci. Robot., doi: 10.1126/scirobotics.aaw6736). And they’ve tested out the framework using datasets from drone- and vehicle-mounted event cameras. The motion-aware vision that results could, the researchers believe, be useful in helping autonomous robots and self-driving cars develop navigation and obstacle-avoidance strategies.

Detecting events

The neuromorphic cameras known as dynamic vision sensors (DVSs) consist of arrays of pixels, each of which monitors the change in voltage of a photodiode, which is proportional to the log of the light intensity on the pixel. A new visual “event” is triggered in the camera only when the voltage change (and, hence, the change in light intensity) exceeds a specific threshold. The DVS returns the data triple (x, y, t)—with x and y denoting pixel position and t denoting the time of the intensity change.

Because changes in brightness often stem from changes in relative motion, neuromorphic cameras such as DVSs are considered particularly strong at capturing uninterrupted movement, dispensing with the large amounts of non-motion-related, redundant data and latencies associated with conventional RGB cameras. The motion-sensing abilities and efficient data handling of neuromorphic cameras thus offer a potential route to active perception—the kind of motion-aware vision that allows humans and animals, for example, to perceive that parts of the scene outside of themselves are stationary rather than moving.

Putting visual and motion data together

Getting to active perception, however, requires a way to integrate the visual data from these systems with the motor and activity data from the robot itself into a single data representation, in a form that machine-learning routines can grab onto. The University of Maryland team behind the new research tackled the problem using a data representation called a hyperdimensional binary vector (HBV)—a data object, potentially 10,000 bits in length, that’s easily digestible by machine-learning architectures such as convolutional neural networks (CNNs).

The researchers developed a mathematical framework for encoding both perception data from the camera and velocity data from the robot in HBVs of the same length, allowing for efficient processing and interchange of perception and motion data between the vectors. The visual and motion data thus can computationally “live” in the same representational space, as a single data record. The team also wrote an open-source software library, in the computing language Python, that allows accelerated processing of these king-sized binary vectors, with routines capable of performing more than 100,000 permutations per second on vectors up to 8,000 bits long.

Making (perceptual) memories

The team tested out its system by strapping a camera equipped with a DAVIS 240b DVS sensor onto a Qualcomm quadcopter drone fitted with an onboard computer. The researchers found that, using a CNN, the system could learn to connect visual and motion data encoded in the HBVs in a single perception (confirmed by its ability to accurately judge the drone’s velocity from sparse event data from the camera). This was true, the authors noted, even though the HBVs “inherently hide the raw data in their encodings.” The team also applied the framework to a popular multivehicle stereo event camera (MVSEC) test set for self-driving vehicles.

Intriguingly, the Maryland researchers—who acknowledge that the use of HBVs is still at an early stage—note that these structures “can encode entire histories of actions and perceptions … as constant-sized vectors.” That means, they suggest, that HBVs could prove a particularly useful and natural way to “create semantically significant and informed memories” of both action and perception in a machine setting.