Solving infinite horizon discounted Markov decision process problems for a range of discount factors (Q584085): Difference between revisions
From MaRDI portal
Changed an Item |
Set profile property. |
||
Property / MaRDI profile type | |||
Property / MaRDI profile type: MaRDI publication profile / rank | |||
Normal rank |
Revision as of 00:42, 5 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Solving infinite horizon discounted Markov decision process problems for a range of discount factors |
scientific article |
Statements
Solving infinite horizon discounted Markov decision process problems for a range of discount factors (English)
0 references
1989
0 references
Consider the following decision problem. There is a finite set I. For each \(i\in I\) there is a finite action set K(i). If at a decision epoch the state is \(i\in I\) and an action \(k\in K(i)\) is taken then the new state j is a random variable described by a given transition probability p(i,k;j). There is an immediate reward r(i,k) with \(0\leq r(i,k)\leq M<\infty\) and discount factor of the shape \(\tau =t\rho\) where \(t\in [0,1]\) and \(0\leq \rho <1\). The number t is treated as a parameter. The paper deals with maximizing the infinite horizon discounted rewards over the policies of the form \(\pi =(\delta)^{\infty}\), where \(\delta\) : \(I\to \cup_{i\in I}K(i).\) Let \(v_ t(i)\), \(i\in I\) denote infinite horizon expected, discounted reward corresponding to parameter t. The problems are the following: (a) to find approximations for \(v_ t\) over the range [0,1]; (b) to find approximations for \(v_{t+\delta}\) when \(v_ t\) is given and \(\delta\) may take value in [0,1-t]. Some algorithms for solving the problems are presented.
0 references
infinite horizon discounted rewards
0 references