Experimental quantum speed-up in reinforcement learning agents

DOI10.5281/zenodo.4327211ZenodoMaRDI QIDQ6695254FDO

Authors Teodor Strömberg, Beate E. Asenbeck, Nicolai Friis, Sabine Wölk, Peter Schiansky, Valeria Saggio, Vedran Dunjko, Philip Walther, Nicholas C. Harris, Arne Hamann, Michael Hochberg, Hans J. Briegel, Dirk Englund

Publication date 10 March 2021

Copyright license Creative Commons Attribution 4.0 International

Description

The content of the text files can be used to reproduce the experimental plots presented in the manuscript. All the files containsequences of numbers 0 and 1, which represent the non-reward and reward assigned to the agent after every epoch, respectively.In more detail, three different cases are shown: classical strategy, quantum strategy, and combined strategy. 1. Classical strategy The text files namedClassical_10agents_1000rounds.txt, Classical_139agents_1000rounds.txt, and Classical_16agents_1000rounds.txt consist of 10, 139, and 16 consecutive sequences of 1000 numbers, respectively. Each of these sequences corresponds to one agent playing 1000 epochs. Therefore, there are in total10+139+16=165 agents, each playing 1000 epochs. Merging these files and averaging the epoch outcomes 0 or 1 over all the 165 agents will make it possible to reproduce the behaviour of the average reward for the classical strategy. 2. Quantum strategy The text file named Quantum_165agents_500rounds.txt contains 165 arrays of length 500. Similarly to the previous case, the content of each array is progressively created every time the agent plays an epoch. 500 epochs are played.Also in this case, averaging the different outcomes over all the 165 agents will reproduce the behaviour of the average reward for the quantum strategy. 3. Combined strategy In this case, one file is acquired for the quantum strategy, and one for the classical strategy. These files are named Combined_quantum_165agents.txt and Combined_classical_165agents.txt, respectively. Also in this case, 165 arrays (representing the 165 agents) are present in both files. However, the length of these arrays is not always the same. Taking as an example the first array of both the quantum and the classical files, there are 47 elements in the quantum case and 906 in the classical case. This means that the agent has played47 quantum epochs before switching to a classical strategy.Therefore, in order to combinethese two cases, one needs to merge the quantum array with the classical array, where only the even-indexed elements have been selected in the classical file. In this way, one obtains 47+906/2=500 epochs. The same procedure applies to the rest of the arrays. After 165 arrays of length 500 are obtained, the average over all the agents can be performed, and the plot for the combined strategy can thus be reproduced.

This page was built for dataset: Experimental quantum speed-up in reinforcement learning agents