A team has reported the first proof-of-principle experiment of a new quantum-based learning technique, shown here in an artist’s impression. [Image: Rolando Barry, University of Vienna]
Being rewarded for repeated good performance has proven to be a successful learning technique, whether the student involved is a human being or a machine-learning algorithm. But this so-called reinforcement learning can be very time consuming.
Now, an international research collaboration has shown how to significantly speed things up by using a nanophotonic processor to carry out this technique employing both classical and quantum communication. Possible applications for this approach, the researchers say, span a range of sectors, from robotics to healthcare (Nature, doi: 10.1038/s41586-021-03242-7).
Reinforcement learning involves repeated interaction between some kind of an agent making decisions and the environment. In response to a series of stimuli or “percepts” from the latter, the former on each occasion takes a certain course of action. Each time the agent chooses the correct action they get rewarded in some way, and then modifies their future responses accordingly—with the idea that over time they choose the correct response on each occasion.
Quantum mechanics, however, can introduce a twist. Rather than deciding between two particular courses of action at each turn, a learner instead responds with a superposition of the two possible actions. That introduces a finer-grained spectrum of responses, allowing quicker learning.
Valeria Saggio at the University of Vienna in Austria uses the analogy of someone standing at a junction in the road and deciding which way to go. Rather than choosing to turn either left or right, the person instead turns both left and right at the same time—doing so with some specific mix of the two options.
The concept of quantum reinforcement learning was put forward by Hans Briegel at the University of Innsbruck in Austria and colleagues in 2016. Earlier related work by the same group had benefited quantum information processing in terms of decoding errors and helping to design quantum experiments, for example, and also allowed agents to take quicker decisions. However, all such prior research involved agents interacting classically with the environment—resulting in no overall reduction in learning time.
An integrated nanophotonic platform
In the latest work, Briegel has got together with Saggio and Philip Walther at the University of Vienna as well as other colleagues in Austria, the Netherlands and the USA, to implement these ideas using quantum communication. In fact, the system they have demonstrated is a hybrid, with the agent able to act either according to a classical probability distribution or by creating a superposition of all rewarded and non-rewarded actions.
The system involves a fully programmable nanophotonic processor containing a series of waveguides fabricated to form numerous tunable beam splitters. A single photon (at telecom wavelengths) entering the processor is acted on either classically or quantum-mechanically, with the device in the latter case being ideally split into three regions—the first and last corresponding to the agent and the middle one to the environment.
When leaving the processor, each photon is detected, with a certain probability, in one of two superconducting nanowire detectors corresponding to the right and wrong outcomes. The system then registers that output and uses a conventional computer to update its learning algorithm. The next single photon is then manipulated according to this revised procedure.
The specific protocol demonstrated by the collaboration, which is led by Walther, involved processing data in “epochs,” each of which consisted of multiple percepts and actions for each output. Because the reward outcomes of quantum epochs could not be detected directly, the researchers alternated quantum and classical epochs, with the latter used to update the learning algorithm.
Classic, quantum or mixed?
Carrying out 10,000 simulations and 165 experimental runs, the researchers compared the effectiveness of a quantum strategy (involving alternating epochs) and a purely classical one. They found that the former initially learned much more quickly—cutting learning time by up to 63%—but afterwards actually got worse over time (being a Grover-like algorithm, whose effectiveness drops after a certain point). The classical strategy, in contrast, saw a steady, although initially slower, improvement.
The experimental setup allows agents to optimize their strategy by switching from the quantum to classical versions once the effectiveness of the former starts to drop off. As to how much quicker that mixed strategy is compared to the purely classical variety, the researchers point out that it depends on the desired success threshold.
Walther and colleagues say that they envision their scheme being applied in particular to problems involving frequent searches in “large search spaces.” They cite the example of routing computer networks, estimating that with a few tens of photons, waveguides and detectors, the system could establish the optimal route from around 10,000 possibilities.
The researchers reckon that “substantial steps” could be made toward such multi-photon applications, thanks to progress with superconducting detectors, single-photon sources and artificial atoms in photonic circuits.