NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications
Paczolay, Gabor
Harmati, Istvan
2025-08-18T13:00:44Z
2025-08-18T13:00:44Z
2024
1785-8860
hu_HU
http://hdl.handle.net/20.500.14044/32343
Discount factor plays an important role in reinforcement learning algorithms. It
decides how much future rewards are valued for the present time-step. In this paper, a system
with a Q value estimation, based on two distinct discount factors are utilized. These
estimations can later be merged into one network, to make the computations more efficient.
The decision of which network to use, is based on the relative value of the maximum value of
the short-term network, the more unambiguous the maximum is, the more probability is
rendered to the selection of that network. The system is then benchmarked, on a cartpole and
a gridworld environment.
hu_HU
dc.format
PDF
hu_HU
en
hu_HU
NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications
hu_HU
Open access
hu_HU
Óbudai Egyetem
hu_HU
Budapest
hu_HU
Óbudai Egyetem
hu_HU
Társadalomtudományok - gazdálkodás- és szervezéstudományok