Step 1 - Model Training


Now that we have a feel for the data we are dealing with, we can start designing our model. In this notebook, we will define the network architecture and train the model. We will also discuss some of the transformations on the data in response to the observations that we made in the data exploration section of the notebook.

Let us start by importing some libraries.

Let's read in the datasets from the exploration phase. If these do not exist, run the Step 0 notebook to generate them.

For image data, it is too expensive to load the entire dataset into memory. Fortunatly, Keras has the concept of DataGenerators. A DataGenerator is nothing more than an iterator that will read data from disk in chunks. This allows you to keep both your CPU and GPU busy, increasing throughput.

We made a few observations during the exploration phase. Now is the time to come up with a strategy to address them:

  • Only a small portion of the image is of interest - when generating batches, we can remove the pieces of the image that are not of interest.
  • The dataset exhibits vertical flip tolerance - when generating batches, we can randomly flip some images and labels around the Y axis so the model has new data to learn from.
  • The dataset should be invariant to changes in lighting - when generating batches, we can randomly add or remove brightness from the images so the model can learn that global changes in lighting should be ignored.
  • The dataset has a high proportion of zero-valued images - when generating batches, we can randomly drop a percentage of data points where the steering angle is zero so the model sees a balanced dataset when training.
  • We need examples from the swerving strategy in our dataset so the model learns how to turn sharply - we took care of this in the preprocessing phase.

While Keras does have some standard built-in transforms for images, they are not sufficient for our application. For example, when using horizontal_flip = True in the standard ImageDataGenerator, the signs of the labels are not inverted. Fortunatly, we can just extend the ImageDataGenerator class and implement our own transform logic. The code to do so is in - it is straightforward, but too long to include in this notebook.

Here, we will initialize the generator with the following parameters:

  • Zero_Drop_Percentage: 0.9 - That is, we will randomly drop 90% of the data points with label = 0
  • Brighten_Range: 0.4 - That is, the brighness of each image will be modified by up to 40%. To compute "brightness", we transform the image from RGB to HSV space, scale the 'V' coordinate up or down, and transform back to RGB space.
  • ROI: 76,135,0,255 - This is the x1, x2, y1, y2 rectangle that represents the area of interest for the images.

Thought Exercise 1.1 Try playing around with these parameters to see if you can get better results.

Let's look at a sample batch. The steering angle is represented by the red line in the image:

Loading output library...
Loading output library...
Loading output library...

Next, let's define the network architecture. We will use a standard combination of convolutional / max pooling layers to process the images (we cannot go into the details of what each of these layers do here, but you should definitely check out the book mentioned in the readme file if you do not understand what is going on). Then, we will inject the vehicle's last known state into the dense layer as an additional feature. The layer sizes and optimization parameters were determined experimentally - try tweaking them and see what happens!

Let's look at a summary of our model

That's a lot of parameters! Fortunately, we have our data augmentation strategies, so the network has a chance of converging. Try adding / removing layers or changing their widths to see what effect it has on the number of trainable parameters in the network.

One of the nice features of Keras is the ability to declare callbacks. These functions get executed after each epoch of training. We will define a few callbacks:

  • ReduceLrOnPlateau - If the model is near a minimum and the learning rate is too high, then the model will circle around that minimum without ever reaching it. This callback will allow us to reduce the learning rate when the validation loss stops improving, allowing us to reach the optimal point.
  • CsvLogger - This lets us log the output of the model after each epoch, which will allow us to track the progress without needing to use the console.
  • ModelCheckpoint - Generally, we will want to use the model that has the lowest loss on the validation set. This callback will save the model each time the validation loss improves.
  • EarlyStopping - We will want to stop training when the validation loss stops improving. Otherwise, we risk overfitting. This monitor will detect when the validation loss stops improving, and will stop the training process when that occurs.

It's time to train the model! With the default setup, this model takes ~45 min to train on an NVidia GTX970 GPU. Note: sometimes the model will get stuck with a constant validation loss for up to 7 epochs. If left to run, the model should terminate with a validation loss of approximately .0003.

Let's do a quick sanity check. We'll load a few training images and compare the labels and the predictions. These should be very close in value if our model has learned properly.

Loading output library...
Loading output library...
Loading output library...

Looks good! Let's move on to actually running the model with AirSim in the next notebook.