Session: 17-01-01 Research Posters
Paper Number: 77449
Start Time: Thursday, 02:25 PM
77449 - Decentralized Multi-Agent Deep Reinforcement Learning for Surveillance Using Drone Swarms
This abstract proposes the use of decentralized multi-agent deep reinforcement learning (MADRL) to coordinate a homogeneous drone swarm tasked with surveillance & search and rescue (SAR). Small unmanned aerial vehicles (UAVs), like quadcopter drones, have many military and civilian uses. UAVs can be economically used in a myriad of applications such as surveillance, reconnaissance, search and rescue, forestry, flood/fire tracking, and agriculture. Furthermore, small quadcopter drones are generally cost-effective and expendable, allowing missions into hazardous areas without endangering lives or otherwise requiring significant human effort. The use of multiple cooperating UAVs to accomplish these tasks can increase efficiency, reliability, and mission range as well as increasing the likelihood of success compared to a single drone. Consensus, leader-follower, virtual structure, and behavior-based methods have been active areas of research for the control of drones in flight formations. Research into the control of homogeneous drone swarms using reinforcement learning (RL), deep reinforcement learning (DRL), and inverse reinforcement learning (IRL) have also made significant progress. Decentralized DRL methods have been used for autonomous flood and fire monitoring. This research seeks to apply similar methods for surveillance and SAR within a predefined boundary and the tracking of targets within that area.
Using deep reinforcement learning and cooperation between agents, an optimal decentralized control can be developed that takes advantage of multiple UAVs area coverage. One such method to accomplish this is SwarMDP. SwarMDP, or swarm Markov decision process, is a framework developed by Adrian Šošić, Wasiur R. KhudaBukhsh, and Abdelhak M. Zoubir in “Inverse Reinforcement Learning in Swarm Systems” to describe homogeneous multi-agent control problems. SwarMDP is a kind of decentralized partially observable Markov decision process (Dec-POMDP) where the homogeneity of agents allows a multi-agent problem to be reduced to a single-agent problem. In this case, the policy and value functions are shown to be the same between agents. Updates to agent policies will be done using a heterogeneous Q-Learning algorithm that separates agents into exploratory and greedy categories to vary and optimize a Q-function. SwarMDP then compares global expert behavior with the global learned behavior/global value. The global value is estimated based on information about the swarm locally available to an agent. A predefined reward function's return will be optimized by the updating policy and value functions. These reward functions will encourage surveillance and SAR by rewarding the observation of unexplored areas and the constant tracking of a target. Conversely, penalties will be applied for proximity between drones to prevent collisions and convergence on a single target.
An individual agent will use its states (position, direction, velocity, and orientation), the states of local agents, and visual information obtained through an onboard camera to update parameters in decentralized RL. To avoid constant searching over the same area, a recent history of a drone's position will also need to be shared. Images can be processed using a lightweight convolutional neural network (CNN) such as MobileNets. The states and observations will determine the agent’s next action. Drone actions will be limited to movements in the cardinal direction and hovering. A simulation will be done between an area with no targets, an area with static targets, an area with random dynamic targets, and an area with sparse dynamic targets that enter and leave the search region. The results of these simulations will be compared and the overall performance of the system will be evaluated.
The SwarMDP framework will be used to simplify and model a homogeneous multi-agent system tailored to surveillance and SAR tasks through reward functions. A successful model will have developed a decentralized controller that meets these tasks’ objectives efficiently. Future work will need to look at the system's overall security against jamming & adversarial agents, communications with central agents & humans, and its potential for real-world applications.
Presenting Author: Alberto Velazquez University of Texas Rio Grande Valley
Authors:
Alberto Velazquez University of Texas Rio Grande ValleyLei Xu University of Texas Rio Grande Valley
Decentralized Multi-Agent Deep Reinforcement Learning for Surveillance Using Drone Swarms
Paper Type
Poster Presentation