Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective (Q6182771): Difference between revisions

Latest revision as of 21:01, 23 August 2024

scientific article; zbMATH DE number 7795126

Language	Label	Description	Also known as
English	Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective	scientific article; zbMATH DE number 7795126

Statements

instance of

scholarly article

0 references

title

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective (English)

0 references

0 references

0 references

0 references

0 references

0 references

26 January 2024

0 references

full work available at URL

https://arxiv.org/abs/1908.04734

0 references

zbMATH Keywords

AGI safety

0 references

reinforcement learning

0 references

Bayesian learning

0 references

causality

0 references

decision theory

0 references

causal influence diagrams

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Planning and acting in partially observable stochastic domains

0 references

Multi-agent influence diagrams for representing and solving games.

0 references

General time consistent discounting

0 references

Representing and Solving Decision Problems with Limited Information

0 references

Q3511269

0 references

Q3651576

0 references

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

0 references

Q3096180

0 references

Q4626283

0 references

Identifiers

zbMATH Open document ID

0 references

0 references

10.1007/s11229-021-03141-4

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:6182771

@@ Property / OpenAlex ID @@
+W3165436200
@@ Property / OpenAlex ID: W3165436200 / rank @@
+Normal rank
@@ Property / cites work @@
+Planning and acting in partially observable stochastic domains
+Normal rank
@@ Property / cites work @@
+Multi-agent influence diagrams for representing and solving games.
+Normal rank
@@ Property / cites work @@
+General time consistent discounting
@@ Property / cites work: General time consistent discounting / rank @@
+Normal rank
@@ Property / cites work @@
+Representing and Solving Decision Problems with Limited Information
+Normal rank
@@ Property / cites work @@
+Q3511269
@@ Property / cites work: Q3511269 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3651576
@@ Property / cites work: Q3651576 / rank @@
+Normal rank
@@ Property / cites work @@
+A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
+Normal rank
@@ Property / cites work @@
+Q3096180
@@ Property / cites work: Q3096180 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4626283
@@ Property / cites work: Q4626283 / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:6182771