Q5214215 (Q5214215): Difference between revisions

From MaRDI portal
Import240304020342 (talk | contribs)
Set profile property.
ReferenceBot (talk | contribs)
Changed an Item
 
(One intermediate revision by one other user not shown)
Property / arXiv ID
 
Property / arXiv ID: 1703.07608 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Near-Optimal Regret Bounds for Thompson Sampling / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4257216 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Some asymptotic theory for the bootstrap / rank
 
Normal rank
Property / cites work
 
Property / cites work: Discounted Dynamic Programming / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4821526 / rank
 
Normal rank
Property / cites work
 
Property / cites work: 10.1162/153244303765208377 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3959963 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4318617 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Bootstrap prediction and Bayesian prediction under misspecified models / rank
 
Normal rank
Property / cites work
 
Property / cites work: The Efficiency Analysis of Choices Involving Risk / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2896090 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Near-optimal reinforcement learning in polynomial time / rank
 
Normal rank
Property / cites work
 
Property / cites work: Reducing reinforcement learning to KWIK online regression / rank
 
Normal rank
Property / cites work
 
Property / cites work: Knows what it knows: a framework for self-aware learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Increasing risk: Some direct constructions / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q( $$\lambda $$ ) with Off-Policy Corrections / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5214215 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Bootstrapping data arrays of arbitrary order / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning to Optimize via Posterior Sampling / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning to Optimize via Information-Directed Sampling / rank
 
Normal rank
Property / cites work
 
Property / cites work: How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Tutorial on Thompson Sampling / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4626283 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Algorithms for Reinforcement Learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2880944 / rank
 
Normal rank
Property / cites work
 
Property / cites work: An analysis of temporal-difference learning with function approximation / rank
 
Normal rank

Latest revision as of 16:58, 21 July 2024

scientific article; zbMATH DE number 7164724
Language Label Description Also known as
English
No label defined
scientific article; zbMATH DE number 7164724

    Statements

    0 references
    0 references
    0 references
    0 references
    7 February 2020
    0 references
    reinforcement learning
    0 references
    exploration
    0 references
    value function
    0 references
    neural network
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references