NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

Paczolay, Gabor; Harmati, Istvan

Megtekintés/Megnyitás

Paczolay_Harmati_151.pdf (524.2KB)

Metaadat

Teljes megjelenítés

Link a dokumentumra való hivatkozáshoz:

http://hdl.handle.net/20.500.14044/32343

Gyűjtemény

Acta Polytechnica Hungarica [175]

Absztrakt

Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.

Cím és alcím: NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications
Szerző: Paczolay, Gabor; Harmati, Istvan
Megjelenés ideje: 2024
Hozzáférés szintje: Open access
ISSN, e-ISSN: 1785-8860
Nyelv: en
Terjedelem: 16 p.
Tárgyszó: reinforcement learning, DQN, NPV, NPV-DQN
Változat: Kiadói változat
Egyéb azonosítók: DOI: 10.12700/APH.21.11.2024.11.10
A cikket/könyvrészletet tartalmazó dokumentum címe: Acta Polytechnica Hungarica
A forrás folyóirat éve: 2024
A forrás folyóirat évfolyama: 21. évf.
A forrás folyóirat száma: 11. sz.
Műfaj: Tudományos cikk
Tudományterület: Társadalomtudományok - gazdálkodás- és szervezéstudományok
Egyetem: Óbudai Egyetem