Recent advances in hierarchical reinforcement learning (Q5907023)

This survey contains different interesting approaches on how to develop reinforcement learning (RL) algorithms in order to overcome the problem of exponentially growing dimensionality of parameters to be learned. It includes a brief classical description of the problem as well. RL algorithms address the problem of how a behaving agent can learn to approximate an optimal behavioral strategy while interacting directly with its environment. Most of RL research is based on the formalism of Markov decision processes. This means that the environment is described as a controlled Markov chain with stationary parameters and the core policy of the agent is as follows: in every state \(s\) it should select some optimal state action \(a_{\text{opt}}(s)\). The policy can be found by dynamic programming or reinforcement learning algorithms. However, its exact determination becomes practically impossible for a huge number of parameters. In this case some policies that are close to optimal should be determined. In the survey different generalizations of the core policy are considered, e.g. based on the notion of ``option'' which is used instead of ``action''. For every state \(s\), an option \(o(s)\) describes some sequence of actions \(a(s),a(s'),a(s''),\dots\) for the corresponding consequent states \(s,s',s'',\dots\) and termination condition. It is assumed that the initialization of the next option occurs whenever the preceding option terminates. So, a close to optimal policy can be described in terms of optimal options \(o_{\text{opt}}(s)\) for every state \(s\). Other considered approaches to hierarchical RL include hierarchies of abstract machines and MAXQ value function decomposition. Among recent advances in hierarchical RL approaches, called concurrent activities, multi-agent coordination and hierarchical memory are considered. Concluding remarks address some open problems in RL for further development.

0 references

reviewed by

Alex V. Kolnogorov

0 references

zbMATH Keywords

reinforcement learning

0 references

Markov decision processes

0 references

semi-Markov decision processes

0 references

hierarchy