Deterministic policies based on maximum regrets in MDPs with imprecise rewards
From MaRDI portal
Publication:5069649
DOI10.3233/AIC-190632zbMath1487.68205OpenAlexW3201191077WikidataQ113417297 ScholiaQ113417297MaRDI QIDQ5069649
Aomar Osmani, Emiliano Traversi, Pegah Alizadeh
Publication date: 19 April 2022
Published in: AI Communications (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.3233/aic-190632
branch-and-boundminimax regretMarkov decision processdeterministic policystochastic policyunknown rewards
Markov and semi-Markov decision processes (90C40) Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) (68T20)
Cites Work
- Unnamed Item
- Unnamed Item
- Machine learning and knowledge discovery in databases. European conference, ECML PKDD 2011, Athens, Greece, September 5--9, 2011. Proceedings, Part I
- Partitioning procedures for solving mixed-variables programming problems
- Bounded-parameter Markov decision processes
- Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
- Bias and Variance Approximation in Value Function Estimates
- Regret in Decision Making under Uncertainty
- Robust Markov Decision Processes
- Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)
- Robust Control of Markov Decision Processes with Uncertain Transition Matrices
- Robust Dynamic Programming
This page was built for publication: Deterministic policies based on maximum regrets in MDPs with imprecise rewards