A Novel Augmentative Backward Reward Function with Deep Reinforcement Learning for Autonomous UAV Navigation

Chansuparp, Manit and Jitkajornwanich, Kulsawasd (2022) A Novel Augmentative Backward Reward Function with Deep Reinforcement Learning for Autonomous UAV Navigation. Applied Artificial Intelligence, 36 (1). ISSN 0883-9514

[thumbnail of A Novel Augmentative Backward Reward Function with Deep Reinforcement Learning for Autonomous UAV Navigation.pdf] Text
A Novel Augmentative Backward Reward Function with Deep Reinforcement Learning for Autonomous UAV Navigation.pdf - Published Version

Download (7MB)

Abstract

The autonomous UAV (unmanned aerial vehicle) navigation has recently gained an increasing interest from both academic and industrial sectors due to its potential uses in various fields and especially, the need for social distancing during the pandemic. Many works have adopted a deep reinforcement learning (RL) method with experience replay called deep deterministic policy gradient (DDPG) to control the motion of UAV, and gain high accuracy results in static and simplified environments. However, they are still far from being ready for real world adoption in that the UAVs have to operate under complex and dynamic conditions. We also found that using only DDPG makes the learning process prone to oscillation and is inefficient for tasks having high dimensional action-state spaces. Furthermore, the goal reward mechanism in traditional reward functions brings a bias to the state, which resembles the one at the goal area and leads to erroneous action selection. To get closer to being ready for real world adoption, we proposed a novel method that enables UAVs to be capable of handling motion control in realistic environments. The first component of our proposed method is point cloud data (PCD) simplification with truncated icosahedron structure which converts enormous PCD into a few essential data points. In the second component of our method, we replace the traditional goal reward mechanism with a new mechanism called Augmentative Backward Reward (ABR) function to dispense the goal reward to transitions proportionately to its participation. By integrating simplified PCD and ABR, we achieved significantly better results when compared with using only the-state-of-the-art, TD3. In addition, we tested the proposed method with another navigation task, BipedalWalkerHardcore, a testbed for RL, and the result is still better and steadier than of TD3. These results indicate that the proposed method is robust.

Item Type: Article
Subjects: Apsci Archives > Computer Science
Depositing User: Unnamed user with email support@apsciarchives.com
Date Deposited: 14 Jun 2023 06:25
Last Modified: 05 Dec 2023 04:16
URI: http://eprints.go2submission.com/id/eprint/1305

Actions (login required)

View Item
View Item