Home >Projects >Grid Sim

GridSim: A Vehicle Kinematics Engine for Deep Neuroevolutionary Control in Autonomous Driving


12 February 2019

Abstract: Vehicle control has been a challenge in the field of autonomous driving for many years. Current state of the art solutions mainly use supervised end-to-end learning, or decoupled perception, planning and action pipelines. If we consider autonomous driving as a multi-agent setting, deep reinforcement learning is also suitable for solving the driving task. However, both deep Q-learning and policy gradients methods require that the agent interacts with its surroundings in order to learn its behavior via reward signals. In autonomous driving this task is performed within a simulated environment. In this paper we introduce GridSim, which is an autonomous driving simulator engine that uses a car-like robot architecture to generate occupancy grids from simulated sensors. It allows for multiple scenarios to be easily represented and loaded into the simulator as backgrounds. We use GridSim to study the performance of two deep learning approaches used in autonomous driving, that is, deep reinforcement learning and driving behavioral learning through genetic algorithms. The algorithms are evaluated on simulated highways, curved roads and inner-city scenarios, all including different driving limitations. The methods are evaluated by monitoring different performance metrics. The deep network used for vehicle control uses sequences of synthetic occupancy grids as input, while encoding the desired behavior in a two elements fitness function describing a maximum travel distance and a maximum forward speed, bounded to a specific interval.

ArXiv paper link
Citation
@article{GridSim2019,
    author = {Bogdan Trasnea and Andrei Vasilcoi and Claudiu Pozna and Sorin Grigorescu},
    title = {GridSim: A Vehicle Kinematics Engine for Deep Neuroevolutionary Control in Autonomous Driving},
    booktitle = {Int. Conf. on Robotic Computing IRC 2019},
    address = {Naples},
    month = {25-27 February},
    year = {2019}
}
1. Architecture
GridSim and two possible pipelines for the deep neural control of a simulated car.

(top) GridSim driving scenarios.
(middle) DQN Agent pipeline using the input OGs for interacting with the simulated environment in order to maximize its reward function.
(bottom) Neuroevolutionary Agent: the DNN’s weights are evolved using genetic algorithms with altered breeding rules, in order to maximize a fitness function.

2. Vehicle Dynamics Simulation Engine

The simulation engine uses the non-holonomic robot car kinematics [1]. The steering is modelled through angle δ as an extra degree of freedom on the front wheel, while the ”non-holonomic” assumption is expressed as a differential constraint on the motion of the car, which restricts the vehicle from making lateral displacements, without simultaneously moving forward. GridSim contains a menu which allows the switching between multiple scenarios which are easily represented and loaded into the simulator as backgrounds. Snapshots of GridSim’s possible scenarios can be seen in the top of the above figure. The simulated sensors have a field of view (FOV) of 120 degrees. They react when an obstacle is sensed, by marking it as an occupied area. The static obstacles are a priori mapped to the backgrounds as lists of polygons. The simulated sensors continuously check if the perception rays are colliding with the given polygons.

3. Method

We use GridSim to study the performance of two simulation based autonomous driving approaches: deep reinforcement learning and the control of a deep neuroevolutionary agent. The neuroevolutionary part of the algorithm represents the evolution of the weights of a deep neural network by using a population-based genetic algorithm, with altered breeding rules (custom tournament selection). The training is performed against a multi-objective fitness function which maximizes two elements: the traveled path and the longitudinal speed. This learning procedure was first proposed by the authors for training a generative one-shot learning classifier [2]. It aims to compute optimal weights for a collection of K deep networks:



The agent controls the ego-car using the elite individual DNN from the given generation, while the custom tournament selection algorithm ensures that the best accuracy individuals carry on to the next generation unmodified.
As a comparison to our Neuroevolutionary Agent, we have implemented a DQN agent, which uses a decision space of eight actions. The algorithm starts from an initial state and proceeds until the the agent has collided with its surroundings. In every step, the agent is described by the current state s, it follows policy p(s) and observes the next state together with the reward received from the environment. The reward policy is constructed in the following way:





where r is the total normalized reward, f(d) is the distance travelled, f(v) is the current velocity of the vehicle, f(S) is the sensor policy and S is the sensor action-value vector. The algorithm continues until the convergence of the Q function, or until a certain number of episodes is reached, while also ensuring a sanity check of 15 actions.

4. Experiments
Comparison of the DQN and Neuroevolutionary agents in the five GridSim scenarios.

Performance comparison in regards to overall velocity error percentage and average training time of both models with different scenarios in GridSim. We observe that the velocity error of the neuroevolutionary approach is smaller in all scenarios, while keeping its training time low.

4.1. Deep reinforcement learning

After several hours of training in the simulator (see training time comparison from Fig- 8), the DQN network could navigate portions of the environment, but would still drive off the road. In order to converge to a collide-free model, the DQN agent needed over 20 hours of interacting with the GridSim environment.

4.2. Deep neuroevolutionary agent

The input of the neural network is a vector described by the values of the occupancy grid generated by the synthetic beams of the radar sensor model. The number of sensor beams is also configurable and can be increased to any resolution necessary. After the desired behavior was met, and the car was able to navigate the seamless generated model by itself, we performed incremental updates to the decision space.

References

[1] J. Kong et al., "Kinematic and dynamic vehicle models for autonomous driving control design", in Intelligent Vehicles Symposium, 2015, pp. 1094–1099.
[2] S. Grigorescu, "GOL: A Semi-Parametric Approach for One-Shot Learning in Autonomous Vision", in Int. Conf. on Robotics and Automation ICRA 2018