NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

View/ Open
Metadata
Show full item record
URI
Collections
Abstract
Discount factor plays an important role in reinforcement learning algorithms. It
decides how much future rewards are valued for the present time-step. In this paper, a system
with a Q value estimation, based on two distinct discount factors are utilized. These
estimations can later be merged into one network, to make the computations more efficient.
The decision of which network to use, is based on the relative value of the maximum value of
the short-term network, the more unambiguous the maximum is, the more probability is
rendered to the selection of that network. The system is then benchmarked, on a cartpole and
a gridworld environment.
- Title
- NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications
- Author
- Paczolay, Gabor
- Harmati, Istvan
- xmlui.dri2xhtml.METS-1.0.item-date-issued
- 2024
- xmlui.dri2xhtml.METS-1.0.item-rights-access
- Open access
- xmlui.dri2xhtml.METS-1.0.item-identifier-issn
- 1785-8860
- xmlui.dri2xhtml.METS-1.0.item-language
- en
- xmlui.dri2xhtml.METS-1.0.item-format-page
- 16 p.
- xmlui.dri2xhtml.METS-1.0.item-subject-oszkar
- reinforcement learning, DQN, NPV, NPV-DQN
- xmlui.dri2xhtml.METS-1.0.item-description-version
- Kiadói változat
- xmlui.dri2xhtml.METS-1.0.item-identifiers
- DOI: 10.12700/APH.21.11.2024.11.10
- xmlui.dri2xhtml.METS-1.0.item-other-containerTitle
- Acta Polytechnica Hungarica
- xmlui.dri2xhtml.METS-1.0.item-other-containerPeriodicalYear
- 2024
- xmlui.dri2xhtml.METS-1.0.item-other-containerPeriodicalVolume
- 21. évf.
- xmlui.dri2xhtml.METS-1.0.item-other-containerPeriodicalNumber
- 11. sz.
- xmlui.dri2xhtml.METS-1.0.item-type-type
- Tudományos cikk
- xmlui.dri2xhtml.METS-1.0.item-subject-area
- Társadalomtudományok - gazdálkodás- és szervezéstudományok
- xmlui.dri2xhtml.METS-1.0.item-publisher-university
- Óbudai Egyetem