**Abstract:** Autonomous vehicles are controlled today either based on sequences of decoupled perception-planning-action operations, either based on End2End or Deep Reinforcement Learning (DRL) systems. Current deep learning solutions for autonomous driving are subject to several limitations (e.g. they estimate driving actions through a direct mapping of sensors to actuators, or require complex reward shaping methods). Although the cost function used for training can aggregate multiple weighted objectives, the gradient descent step is computed by the backpropagation algorithm using a single-objective loss. To address these issues, we introduce *NeuroTrajectory*, which is a multi-objective neuroevolutionary approach to local state trajectory learning for autonomous driving, where the desired state trajectory of the ego-vehicle is estimated over a finite prediction horizon by a perception-planning deep neural network. In comparison to DRL methods, which predict optimal actions for the upcoming sampling time, we estimate a sequence of optimal states that can be used for motion control. We propose an approach which uses genetic algorithms for training a population of deep neural networks, where each network individual is evaluated based on a multi-objective fitness vector, with the purpose of establishing a so-called Pareto front of optimal deep neural networks. The performance of an individual is given by a fitness vector composed of three elements. Each element describes the vehicle's travel path, lateral velocity and longitudinal speed, respectively. The same network structure can be trained on synthetic, as well as on real-world data sequences. We have benchmarked our system against a baseline Dynamic Window Approach (DWA), as well as against an End2End supervised learning method.

__ArXiv paper link__

##### Citation

```
@article{NeuroTrajectory2019,
```

author = {Sorin Grigorescu and Bogdan Trasnea and Liviu Marina and Andrei Vasilcoi and Cocias Tiberiu},

title = {NeuroTrajectory: A Neuroevolutionary Approach to Local State Trajectory Learning for Autonomous Vehicles},

booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems - IROS},

address = {Macau, China},

month = {04-08 November},

year = {2019}

}

##### 1. Architecture

###### Deep neural network architecture for estimating local driving trajectories.

The training data and labels consists of synthetic or real-world OG sequences, together with their future trajectory labels. Both synthetic and real-world OG streams are passed through a convolutional neural network, followed by two fully connected layers of 1024 and 512 units, respectively. The obtained CNN

##### 2. Method

NeuroTrajectory is a neuroevolutionary solution for the perception-planning deep neural network from the first figure. In comparison to DRL or End2End, which are used to estimate optimal driving actions, in our work we focus on estimating an optimal local state trajectory over a finite prediction horizon. To implement motion control, the predicted states can be used as input to a model predictive controller. The design and implementation of the motion controller is out of the scope of this paper. We reformulate autonomous driving as a local state trajectory estimation problem for an artificial agent, where a Deep Neural Network (DNN) is used to predict a local ego-vehicle trajectory. The trajectory is defined as a sequence of future desired states, estimated over a finite prediction horizon. The weights of the DNN are calculated based on a multi-objective fitness vector composed of three losses: the ego-vehicle's traveled path, lateral velocity and longitudinal speed.

We learn an optimal state trajectory by combining Convolutional Neural Networks (CNN) with the robust temporal predictions of Long Short-Term Memory (LSTM) networks. An observation is firstly processed by a CNN, implemented as a series of convolutional layers, aiming to extract relevant spatial features from the input data. The CNN outputs a feature-space representation for each observation. Each processed spatial observation in the input interval *[t-tau_i, t]* is flattened and passed through two fully connected layers of 1024 and 512 units, respectively. The input sequence into an LSTM block is represented by a sequence of spatially processed observations The same network topology from the first architecture figure can be trained separately on synthetic, as well as on real-world data. As trainable network parameters, we consider both the weights of the LSTM networks, as well as the weights of the convolutional layers.

For computing the state trajectory of the ego-vehicle, we have designed a deep neural network, where OG sequences are processed by a set of convolutional layers, before being feed to different LSTM network branches. Each LSTM branch is responsible for estimating trajectory set-points along time interval *[t+1, t+tau_o]*. The choice for a stack of LSTM branches over a single LSTM network that would predict all future state set-points comes from our experiments with different network architectures. Namely, we have observed that the performance of a single LSTM network decreases exponentially with the prediction horizon *tau_o*. The maximum value for which we could obtain a stable trajectory using a single LSTM network was *tau_o* = 2. As shown in the experimental results section, this is not the case with our proposed stack of LSTMs, where each branch is responsible for estimating a single state set-point. The precision difference could occur due to input-output structures of the two architectures. A single LSTM acts as a many-to-many, or sequence-to-sequence, mapping function, where the input sequence is used to generate a time-dependent output sequence. In the case of LSTM branches, the original sequence-to-sequence problem is divided into a stack of many-to-one subproblems, each LSTM branch providing a many-to-one solution, thus simplifying the search space. In our case, the solution given by a specific branch represents an optimal state set-point for a single timestamp along *[t+1, t+tau_o]*.

On the left side of the figure above we can see the mapping of solution vectors Q from the decision space S to objective space L. Each solution Q in decision space corresponds to a coordinate in objective space. The red marked coordinates are the set of Pareto optimal solutions for a multi-objective minimization problem, located on the Pareto front drawn with thick black line.

On the right side the evolution of the fitness vector during training is presented. With each training generation, the traveled path decreases, while the longitudinal velocity increases. The lateral velocity increases together with the longitudinal velocity, but with a much smaller gradient, meaning that the vehicle is learning to avoid hazardous motions and passenger discomfort, although the longitudinal velocity is high. The red dots show the mean fitness value for the longitudinal velocity at each training generation.

##### 3. Experiments

The first set of experiments compared three algorithms (DWA, End2End and NeuroTrajectory) over 20km of driving in GridSim[1]. We have implemented an End2End supervised learning system which predicts the vehicle’s steering angle, discretized with a 3 degree resolution. Given the predicted steering angle, we define a line segment centered on the ego-vehicle. Ideally, the angle between the line segment and the line defined by the first two set-points of the reference trajectory should be zero. As performance metric, we measure the RMSE between the reference points and their closest setpoints on the line segment, for tau_o = 5.

On the right side of the above picture we can observe the RMSE between estimated and human driven trajectories in real-world highway and inner-city testing scenarios. The solid lines indicate the position error, calculated as RMSE. The shaded region indicates the standard deviation. The position errors are higher for inner-city scenes, then in the case of highway driving. NeuroTrajectory achieves the lowest error for both testing scenarios.

On the left side we can see the median and variance of RMSE for the three testing scenarios. The errors in simulation and highway driving are similar. The unstructured nature of inner-city driving introduces higher errors, as well as a higher RMSE variance.

In our experiments, DWA behaved better than End2End mostly due to the structure of the OG input data, which is more suited for grid-based search algorithms. This makes DWA strictly dependent on the quality of the OGs, without having the possibility to apply such classical methods directly on raw data. Additionally, the jittering effect of End2End can be a side effect produced by the discrete nature of its output. However, NeuroTrajectory is able to combine both worlds and obtain a stable state prediction of the ego-vehicle along a given time horizon. We believe that in the long run, learning based approaches will produce better results than traditional methods, like DWA. This improvement would come from training on additional data, including a larger amount of corner cases.

##### References

[1] B. Trasnea, L. Marina, A. Vasilcoi, C. Pozna, and S. Grigorescu, "__Gridsim: A simulated vehicle kinematics engine for deep neuroevolutionary control in autonomous driving__", in Int. Conf. on Robotic Computing IRC 2019, Naples, Italy, 25-27 February 2019.

[2] S. Grigorescu, "__GOL: A Semi-Parametric Approach for One-Shot Learning in Autonomous Vision__", in Int. Conf. on Robotics and Automation ICRA 2018