Transfer of learning by composing solutions of elemental sequential tasks (Q1812933): Difference between revisions

A class of sequential decision tasks (SDTs) is considered, called composite sequential decision tasks, formed by temporally concatenating a number of elemental sequential decision tasks. Elemental SDTs cannot be decomposed into simpler SDTs. A learning agent has to learn to solve a set of elemental and composite SDTs is examined. It is assumed that the structure of the composite tasks is unknown to the learning agent. The straightforward application of reinforcement learning to multiple tasks requires learning the tasks separately, which can waste computational resources, both memory and time. A new learning algorithm and a modular architecture is described that learns the decomposition of composite SDTs, and achieves transfer of learning by sharing the solutions of elemental SDTs across multiple composite SDTs. The solution of a composite SDT is constructed by computationally inexpensive modifications of the solutions of its constituent elemental SDTs. A proof of one aspect of the learning algorithm is provided.

0 references

zbMATH Keywords

reinforcement

0 references

compositional learning

0 references

transfer of learning

0 references

modular architecture

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Q3795523

0 references

Q4403756

0 references

Macro-operators: A weak method for learning

0 references

Q3683893

0 references

\({\mathcal Q}\)-learning

0 references

full work available at URL

https://doi.org/10.1007/bf00992700

0 references

Identifiers

zbMATH Open document ID

0772.68073

0 references

DOI

10.1007/BF00992700

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1812933

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+Q3795523
@@ Property / cites work: Q3795523 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4403756
@@ Property / cites work: Q4403756 / rank @@
+Normal rank
@@ Property / cites work @@
+Macro-operators: A weak method for learning
@@ Property / cites work: Macro-operators: A weak method for learning / rank @@
+Normal rank
@@ Property / cites work @@
+Q3683893
@@ Property / cites work: Q3683893 / rank @@
+Normal rank
@@ Property / cites work @@
+\({\mathcal Q}\)-learning
@@ Property / cites work: \({\mathcal Q}\)-learning / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1007/bf00992700
+Normal rank
@@ Property / OpenAlex ID @@
+W2012036715
@@ Property / OpenAlex ID: W2012036715 / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:1812933