Online regret bounds for Markov decision processes with deterministic transitions (Q982638): Difference between revisions

From MaRDI portal
Import240304020342 (talk | contribs)
Set profile property.
ReferenceBot (talk | contribs)
Changed an Item
 
(One intermediate revision by one other user not shown)
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1016/j.tcs.2010.04.005 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2150011303 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Online Regret Bounds for Markov Decision Processes with Deterministic Transitions / rank
 
Normal rank
Property / cites work
 
Property / cites work: Finite-time analysis of the multiarmed bandit problem / rank
 
Normal rank
Property / cites work
 
Property / cites work: Optimal Adaptive Policies for Markov Decision Processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2896090 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4315289 / rank
 
Normal rank
Property / cites work
 
Property / cites work: A characterization of the minimum cycle mean in a digraph / rank
 
Normal rank
Property / cites work
 
Property / cites work: Faster parametric shortest path and minimum‐balance algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: Finding minimum cost to time ratio cycles with small integral transit times / rank
 
Normal rank
Property / cites work
 
Property / cites work: Near-optimal reinforcement learning in polynomial time / rank
 
Normal rank
Property / cites work
 
Property / cites work: Probability Inequalities for Sums of Bounded Random Variables / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3093197 / rank
 
Normal rank
Property / cites work
 
Property / cites work: The Nonstochastic Multiarmed Bandit Problem / rank
 
Normal rank
Property / cites work
 
Property / cites work: Asymptotically efficient adaptive allocation rules / rank
 
Normal rank
Property / cites work
 
Property / cites work: Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost / rank
 
Normal rank
Property / cites work
 
Property / cites work: Optimal learning and experimentation in bandit problems. / rank
 
Normal rank
Property / cites work
 
Property / cites work: Improved Rates for the Stochastic Continuum-Armed Bandit Problem / rank
 
Normal rank
Property / cites work
 
Property / cites work: Online Markov Decision Processes / rank
 
Normal rank

Latest revision as of 23:40, 2 July 2024

scientific article
Language Label Description Also known as
English
Online regret bounds for Markov decision processes with deterministic transitions
scientific article

    Statements

    Online regret bounds for Markov decision processes with deterministic transitions (English)
    0 references
    0 references
    7 July 2010
    0 references
    labeled digraph
    0 references

    Identifiers