Task 1: Train a CNN to predict a clear road ahead 15 points
The python program sprites.py creates a training and test set of “minirace” scenes,
trainingpix.csv (1024 examples) and testingpix.csv (256 examples). Each
row represents a 16 × 16 screenshot (flattened in row-major order), plus an extra value of either 0 or 1 that indicates if the car can safely drive straight without going off-road in the immediate next step (i.e., there are 257 columns).
Steps
1. Create the datasets by running the sprites.py code.
2. Create a CNN that predicts the whether the car can safely remain on the current position (i.e., drive straight) without crashing into non-drivable terrain.
(a) Describe (no programming): what is a good loss function for this problem?
(b) Implement and train the CNN on the training set.
(c) Compute the accuracy of your model on the test data set.
• Your are free to choose the architecture of your network, but there should be
at least one convolutional layer.
• You can normalise/standardise the data if it helps improve the training.
What to submit:
• A description of your CNN and the training. Calculate the size of each layer,
and include it in the description.
• Include the explanation for the loss function in your description.
• For how long did you train your model (number of epochs, time taken)? What is
the performance on the test set?
• Submit the python code for your solution (either as .py or .ipynb).
Task 2: Train a convolutional autoencoder 10 points
Create a convolutional autoencoder that compresses the racing game screenshots to a
small number of bytes (the encoder), and transforms them back to original (in the de-
coder part).
Steps
1. Create and train an undercomplete convolutional autoencoder and train it using
the training data set from the first task.
2. You can choose the architecture of the network and size of the representation
h = f (x). The goal is to learn a representation that is smaller than the original,
and still leads to recognisable reconstructions of the original.
3. (No programming): Explain the difference between an undercomplete and a de-
noising autoencoder.
4. (No programming): The input images are 16×16 = 256 pixels. What is the size of
your hidden representation h = f (x) (the middle layer size of your autoencoder).
Include your calculation in your report.
What to submit:
• Submit the python code of your undercomplete autoencoder (either as .py or
.ipynb).
• For your report, write a brief description of your steps to create the model and your
prediction. Include the description undercomplete vs. denoising autoencoder, and
your calculations. How do you measure the quality of your model?
• Include screenshots of 1-2 output images next to the original inputs (e.g., select a
good and a bad example).
Task 3: Create a RL agent for Minirace (level 1) 15 points
The code in minirace.py provides an environment to create an agent that can be
trained with reinforcement learning (a complete description at the end of this sheet).
The following is a description of the environment dynamics:
• The square represents the car, it is 2 pixels wide. The car always appears in the
bottom row, and at each step of the simulation the track scrolls by one row below
the car.
• The agent can control the steering of the car, by moving it two pixels to the left
or right. The agent can also choose to do nothing, in which case the car drives
straight. The car cannot be moved outside the boundaries.
• The agent will receive a positive reward at each step where the front part of the
car is still on track.
• An episode is finished when the front of the car hits non-drivable terrain.
In a level 1 version of the game, the observed state (the information made available to
the agent after each step) consists of one number: dx. It is the relative position of the
middle of the track right in front of the car (i.e., the piece of track in the third row from the bottom of the image). When the track turns left in front of the car, this value will be negative, and when the track turns right, dx is positive. As the track is six pixels wide, the car can drive either on the left, middle, or right of a piece of track (it does not need to drive in the middle of the road).
For this task, you should initialise the simulation like this:
therace = Minirace(level=1)
When you run the simulation, step() returns dx (…, −2, −1, 0, 1, 2, …) for the state.
Steps
1. Manually create a policy (no RL) that successfully plays drives the car, just se- lecting actions based on the state information. The minirace.py code contains a function mypolicy() that you should modify for this task.
2. (No programming) How many different values for dx are possible in theory (if you ignore that the car may crash)? If you were to create a tabular reinforcement learning agent, what size is your table for this problem (number of rows and
columns)?
3. Create a (tabular or deep) TD agent that learns to drive. If you decide to use – greedy action selection, set = 1, initially, and reduce it during your training to a minimum of 0.01. Keep your training going until you are either happy with the result or the performance does not improve1.
4. When you run your training, reset the environment after every episode. Store the sum of rewards. After or during the training, plot the total sum of rewards per episode. This plot — the Training Reward plot — indicates the extent to which your agent is learning to improve his cumulative reward. It is your decision when
1This means: do not stop just because reached 0.01 – you may want to stop earlier, or you may want
to keep going, just do not reduce any further. to stop training. It is not required to submit a perfectly performing agent, but show how it learns.
5. After you decide the training to be completed, run 50 test episodes using your
trained policy, but with = 0.0 for all 50 episodes. Again, reset the environment
at the beginning of each episode. Calculate the average over sum-of-rewards-per-
episode (call this the Test-Average), and the standard deviation (the Test-Standard-
Deviation). These values indicate how your trained agent performs.
What to submit:
• Submit the python code of your solutions (both the manual strategy, and the code
of your RL learner).
• For your report, describe the solution, mention the Test-Average and Test-Standard-
Deviation, and include the Training Reward plot described above. After how
many episodes did you decide to stop training, and how long did it take?
Task 4: Create a RL agent for Minirace (level 2) 10 points
In a level 2 version of the game, the observed state (the information made available to
the agent after each step) consists of two numbers: dx1, dx2. The first value (dx1) is the same as dx in level 1 – the relative position of the (middle of the) track in front of the car. The second value (dx2) is the position of the subsequent track (in row 4), relative to the track in front of the car (in row 3).
A second difference is that the track can be more curved: sometimes the track will only overlap on the left or right edge. This means the agent cannot always drive in the middle of the track, because the car can only move one step to the left or right at a time.
For this task, you can initialise like this:
therace = Minirace(level=2)
In the level, step() returns two unnormalised pixel difference values (i.e., two values
from …, −2, −1, 0, 1, 2, …).
Steps
1. Create a RL agent (using a RL method of your choice) that finds a policy using
(all) level 2 state information. A suggested discount factor is γ = 0.95.
2. You can choose the algorithm (a tabular approach, deep TD or deep policy gradi-
ent).
3. Try to train an agent that achieves a running reward > 50 (the minirace.py
file has an example for how to calculate this).
4. If you use a neural network, not go overboard with the number of hidden layers
as this will significantly increase training time. Try one hidden layer.
5. Write a description explaining how your approach works, and how it performs. If
some (or all) of your attempts are unsuccessful, also describe some of the things
that did not work, and which changes made a difference.
What to submit:
• Submit the python code of your solutions.
• For your report, describe the solution, mention the Test-Average and Test-Standard-
Deviation, and include the Training Reward plot described abo