Solving infinite horizon discounted Markov decision process problems for a range of discount factors (Q584085): Difference between revisions

Latest revision as of 12:10, 20 June 2024

scientific article

Language	Label	Description	Also known as
English	Solving infinite horizon discounted Markov decision process problems for a range of discount factors	scientific article

Statements

instance of

scholarly article

0 references

title

Solving infinite horizon discounted Markov decision process problems for a range of discount factors (English)

0 references

published in

Journal of Mathematical Analysis and Applications

0 references

publication date

1989

0 references

review text

Consider the following decision problem. There is a finite set I. For each \(i\in I\) there is a finite action set K(i). If at a decision epoch the state is \(i\in I\) and an action \(k\in K(i)\) is taken then the new state j is a random variable described by a given transition probability p(i,k;j). There is an immediate reward r(i,k) with \(0\leq r(i,k)\leq M<\infty\) and discount factor of the shape \(\tau =t\rho\) where \(t\in [0,1]\) and \(0\leq \rho <1\). The number t is treated as a parameter. The paper deals with maximizing the infinite horizon discounted rewards over the policies of the form \(\pi =(\delta)^{\infty}\), where \(\delta\) : \(I\to \cup_{i\in I}K(i).\) Let \(v_ t(i)\), \(i\in I\) denote infinite horizon expected, discounted reward corresponding to parameter t. The problems are the following: (a) to find approximations for \(v_ t\) over the range [0,1]; (b) to find approximations for \(v_{t+\delta}\) when \(v_ t\) is given and \(\delta\) may take value in [0,1-t]. Some algorithms for solving the problems are presented.

0 references

zbMATH Keywords

infinite horizon discounted rewards

0 references

0 references

0 references

MaRDI publication profile

0 references

cites work

Q3241504

0 references

Some Bounds for Discounted Sequential Decision Processes

0 references

Optimum Policy Regions for Markov Processes with Discounting

0 references

Q3912356

0 references

Reward Revision for Discounted Markov Decision Problems

0 references

Infinite horizon Markov decision processes with unknown or variable discount factors

0 references

Q3867541

0 references

The Determination of Approximately Optimal Policies in Markov Decision Processes by the Use of Bounds

0 references

Q3856450

0 references

Identifiers

zbMATH Open document ID

0692.90101

0 references

DOI

10.1016/0022-247X(89)90179-0

0 references

Mathematics Subject Classification ID

90C40

0 references

zbMATH DE Number

4133862

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:584085

@@ Property / author @@
+Douglas J. White
@@ Property / author: Douglas J. White / rank @@
+Normal rank
@@ Property / reviewed by @@
+Ryszarda Rempała
@@ Property / reviewed by: Ryszarda Rempała / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+Q3241504
@@ Property / cites work: Q3241504 / rank @@
+Normal rank
@@ Property / cites work @@
+Some Bounds for Discounted Sequential Decision Processes
+Normal rank
@@ Property / cites work @@
+Optimum Policy Regions for Markov Processes with Discounting
+Normal rank
@@ Property / cites work @@
+Q3912356
@@ Property / cites work: Q3912356 / rank @@
+Normal rank
@@ Property / cites work @@
+Reward Revision for Discounted Markov Decision Problems
+Normal rank
@@ Property / cites work @@
+Infinite horizon Markov decision processes with unknown or variable discount factors
+Normal rank
@@ Property / cites work @@
+Q3867541
@@ Property / cites work: Q3867541 / rank @@
+Normal rank
@@ Property / cites work @@
+The Determination of Approximately Optimal Policies in Markov Decision Processes by the Use of Bounds
+Normal rank
@@ Property / cites work @@
+Q3856450
@@ Property / cites work: Q3856450 / rank @@
+Normal rank