Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function (Q1886590): Difference between revisions

From MaRDI portal
Set OpenAlex properties.
ReferenceBot (talk | contribs)
Changed an Item
 
Property / cites work
 
Property / cites work: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games / rank
 
Normal rank
Property / cites work
 
Property / cites work: Dual-control theory. I / rank
 
Normal rank
Property / cites work
 
Property / cites work: Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function / rank
 
Normal rank
Property / cites work
 
Property / cites work: \({\mathcal Q}\)-learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Mean, variance and probabilistic criteria in finite Markov decision processes: A review / rank
 
Normal rank
Property / cites work
 
Property / cites work: Simple statistical gradient-following algorithms for connectionist reinforcement learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: The apparent conflict between estimation and control - a survey of the two-armed bandit problem / rank
 
Normal rank

Latest revision as of 14:24, 7 June 2024

scientific article
Language Label Description Also known as
English
Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function
scientific article

    Statements

    Reliability of internal prediction/estimation and its application. I: Adaptive action selection reflecting reliability of value function (English)
    0 references
    0 references
    0 references
    18 November 2004
    0 references
    Internal prediction
    0 references
    Reliability
    0 references
    Model-free reinforcement learning
    0 references
    TD learning
    0 references
    Discount rate
    0 references
    Exploration-exploitation balance
    0 references
    Temperature parameter
    0 references
    Meta-learning
    0 references

    Identifiers