A counterexample on sample-path optimality in stable Markov decision chains with the average reward criterion

From MaRDI portal
Publication:481787