Deterministic policies based on maximum regrets in MDPs with imprecise rewards

From MaRDI portal
Publication:5069649