Artist image of chip

Vienna University of Technology researchers have made a chip comprising light sensors embedded in a neural network, which can recognize the letters n, v or z almost instantly. [Image: Joanna Symonowicz, TU Wien]

The ability to recognize images automatically is an integral part of modern artificial intelligence, being used in everything from self-driving cars to cancer diagnosis. But such machine vision currently relies on a computer analyzing large numbers of frames from a normal camera, making the process slow and energy inefficient.

Now, researchers in Austria have shown how the camera and computer can be replaced by a single device—an electronic circuit that serves as both image sensor and neural network (Nature, doi: 10.1038/s41586-020-2038-x). They have shown that the intelligent sensor can recognize very basic images in just a few tens of nanoseconds, several orders of magnitude faster than current systems.

Inputs and outputs

Neural networks are based loosely on how the brain processes information. Rather than relying on pre-defined algorithms to crunch a set of numbers, they instead use networks of artificial neurons to recognize patterns within data. This is done by gradually adjusting the strength of connections, or weights, between neurons so that certain inputs generate specific outputs.

The new sensor, built by Thomas Mueller and colleagues at the Vienna University of Technology, contains nine pixels—the input neurons. Every pixel consists of three photodiodes made from sheets of tungsten diselenide just three atoms thick, with the current from each photodiode determined by the intensity of incoming light and the voltage across it.

The idea is that each photodiode serves as a weight linking its pixel to one of three additional neurons. Those neurons sum the nine individual currents they receive, and the combined values are then fed into a nonlinear function on a standard computer. The overall result is three numbers, which constitutes the sensor’s output.

Adjusting the weights

The sensor is operated by exposing it to a laser beam that has been spatially modulated so that the incoming light intensity varies from pixel to pixel—and, in the latest work, creating shapes that resemble one of three letters: n, v or z. Once the chip is running smoothly, the pattern of dark and light encoded by a given letter should yield a larger current at one specific output—thereby flagging up that particular letter. But with the weights initially assigned randomly, the first results are garbage.

Training the device to recognize the different letters involves adjusting the weights to try and bring the observed output closer to the desired one. That produces a new output when the sensor is exposed to the beam again, leading the weights to be re-adjusted, and so on, until the device yields the right answer repeatedly. The device can then be disconnected from the computer and let loose to classify fresh data on its own.

The researchers say that they are able to generate correct outputs after about 10 training cycles (although the exact number depends on how noisy the images are). They also show that the chip can learn to recognize the letters even when unsupervised, although learning in that case takes longer—about 30 cycles.

Applications and scaling up

What sets the device apart from conventional machine vision is its speed, the researchers say. Limited only by the duration of physical processes involved in generating a photocurrent, they were able to correctly classify letters within about 50 nanoseconds—corresponding to a rate of 20 million images per second. These high throughputs, suggests Mueller, could make the technology suited to applications such as tracking the propagation of cracks or identifying particles in the debris of high-energy colliders.

Mueller is confident that the device can be scaled-up fairly easily, given that sensors with millions of pixels can be built with today’s technology. He points out that this will require weights to be saved within the sensor itself rather being supplied from an external memory via cabling, as is currently the case. But he believes that could be done using what are known as floating-gate devices, which would store charge at each photodiode in a similar way to flash memory.