Recent advances in hierarchical reinforcement learning (Q5907023)

From MaRDI portal
scientific article; zbMATH DE number 2001998
Language Label Description Also known as
English
Recent advances in hierarchical reinforcement learning
scientific article; zbMATH DE number 2001998

    Statements

    Recent advances in hierarchical reinforcement learning (English)
    0 references
    0 references
    0 references
    10 November 2003
    0 references
    This survey contains different interesting approaches on how to develop reinforcement learning (RL) algorithms in order to overcome the problem of exponentially growing dimensionality of parameters to be learned. It includes a brief classical description of the problem as well. RL algorithms address the problem of how a behaving agent can learn to approximate an optimal behavioral strategy while interacting directly with its environment. Most of RL research is based on the formalism of Markov decision processes. This means that the environment is described as a controlled Markov chain with stationary parameters and the core policy of the agent is as follows: in every state \(s\) it should select some optimal state action \(a_{\text{opt}}(s)\). The policy can be found by dynamic programming or reinforcement learning algorithms. However, its exact determination becomes practically impossible for a huge number of parameters. In this case some policies that are close to optimal should be determined. In the survey different generalizations of the core policy are considered, e.g. based on the notion of ``option'' which is used instead of ``action''. For every state \(s\), an option \(o(s)\) describes some sequence of actions \(a(s),a(s'),a(s''),\dots\) for the corresponding consequent states \(s,s',s'',\dots\) and termination condition. It is assumed that the initialization of the next option occurs whenever the preceding option terminates. So, a close to optimal policy can be described in terms of optimal options \(o_{\text{opt}}(s)\) for every state \(s\). Other considered approaches to hierarchical RL include hierarchies of abstract machines and MAXQ value function decomposition. Among recent advances in hierarchical RL approaches, called concurrent activities, multi-agent coordination and hierarchical memory are considered. Concluding remarks address some open problems in RL for further development.
    0 references
    reinforcement learning
    0 references
    Markov decision processes
    0 references
    semi-Markov decision processes
    0 references
    hierarchy
    0 references
    temporal abstraction
    0 references
    curve of dimensionality
    0 references
    option
    0 references
    MAXQ value function decomposition
    0 references

    Identifiers