Trajectory optimization for autonomous flying base station via reinforcement learning

Bayerlein, Harald; de kerret, Paul; Gesbert, David
SPAWC 2018, 19th IEEE International Workshop on Signal Processing Advances in Wireless Communications, 25-28 June 2018, Kalamata, Greece

In this work, we study the optimal trajectory of an unmanned aerial vehicle (UAV) acting as a base station (BS) to serve multiple users. Considering multiple flying epochs, we
leverage the tools of reinforcement learning (RL) with the UAV acting as an autonomous agent in the environment to learn the trajectory that maximizes the sum rate of the transmission during flying time. By applying Q-learning, a model-free RL technique,
an agent is trained to make movement decisions for the UAV. We compare table-based and neural network (NN) approximations of the Q-function and analyze the results. In contrast to previous works, movement decisions are directly made by the neural network and the algorithm requires no explicit information about the environment and is able to learn the topology of the network to improve the system-wide performance.

DOI
Type:
Conférence
City:
Kalamata
Date:
2018-06-25
Department:
Systèmes de Communication
Eurecom Ref:
5552
Copyright:
© 2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PERMALINK : https://www.eurecom.fr/publication/5552