Session: 17-01-01 Research Posters
Paper Number: 77429
Start Time: Thursday, 02:25 PM
77429 - Personalized Driving Using Inverse Reinforcement Learning With Region-Based Approximation
The United States of America suffered from 33,244 fatal crashes in 2019, which led to 36,096 reported road fatalities, according to the National Highway Traffic Safety Administration. A notable attempt to decrease the number of vehicular accidents is the implementation of Advanced Driver Assistance Systems (ADAS). ADAS is a network of sensors working in unison to provide a safer driving experience for the driver and passengers. ADAS includes systems such as Anti-Lock Braking System, Adaptive Cruise Control, and Lane Keeping Assist. This level of automation is reactive. It does not account for future results prompted by the current action. Thus, it could end up causing preventable accidents due to focus diversion. By removing the human drivers, one can potentially eliminate many issues contributing to accidents, such as fatigue, reckless driving, and driving under the influence. This research looks to develop a personalized autonomous driving model.
Autonomous driving systems can improve driving safety since they are trained with important considerations, such as collision avoidance rules, traffic laws and regulations, and driving decisions obtained from drivers with fixed benchmarks. Individual driver mannerisms are not considered. The high variability and complexity of driving environments complicate the use of standard decision-making due to passengers having individual reactions based on personal experience and habits. Personalized autonomous driving addresses these issues by being a behavioral pattern custom-tailored to the passenger. This personalization can be achieved with the use of Inverse Reinforcement Learning (IRL).
In IRL, an expert is an observed agent whose data is recorded based on its behavior. The data is supplied to a learner. If no underlying policy is known, the learner is trained with state-action pairs called trajectories from which a reward function is inferred. To solve for an optimal reward function, Approximate Dynamic Programming (ADP) uses neural networks as its foundation to approximate a value function. A drawback of ADP is its time consumption since the structure of the neural networks is selected via trial and error. To remedy this problem, Region-based Approximation (RBA) can be used. RBA divides a training set into smaller regions and uses easy-to-train neural networks for them. The reward function is updated by comparing the expert trajectory to the trajectory obtained by the learner’s interaction with the environment. If the trajectories are very different, the reward function will update to classify that trajectory as a punishment. Conversely, if the trajectory of the learner matches with the trajectory of the expert, the reward function will update to classify that trajectory as a reward.
As previously mentioned IRL requires expert demonstrations as input data to train the learning strategy. This input data can be gathered with the use of an Electroencephalography (EEG) system. An EEG is a test that uses small metal disks, called electrodes, to sense electrical signals produced by the brain. During data gathering tests, the electrodes are placed on the head of the examinee, who is given the task of driving for a certain amount of time in different environmental scenarios such as neighborhoods, freeways, construction and school zones, parking lots, and high-traffic and pedestrian-dense commercial districts.
The data obtained will be processed and fed to the learning algorithm whose behavioral responses will be validated by two proposed methods. One will be using CARLA, an open-source autonomous driving simulator. The framework will be planned out and written in Python, where expert demonstrations will be introduced for the algorithm. Once the value functions are known, the controls will be generated and tested in simulations. Another method for result validation is using a robot, which will drive in a simplified city model. The model setting could be altered to observe how the agent will react to different environmental conditions; however, the model is limited to being scale size, and unlike the simulations, the environment would be much simpler.
Presenting Author: Rodrigo Gonzalez University of Texas Rio Grande Valley
Authors:
Rodrigo Gonzalez University of Texas Rio Grande ValleyConstantine Tarawneh University of Texas Rio Grande Valley
Tohid Sardarmehni University of Texas Rio Grande Valley
Personalized Driving Using Inverse Reinforcement Learning With Region-Based Approximation
Paper Type
Poster Presentation