Homework 3

Note: To replicate results in this report run the script files in the hw3 directory in this repository.

Q-Learning

Problem 1

I am still trying to arrange for some computational resources to run this problem. My laptop seems to be too old to run this kind of experiment :(

Problem 2

The following graph plots the average rewards for both the double Q-learning and vanilla Q-learning algorithms on the Lunar Lander game.

Clearly, double Q-learning performs much better than vanilla Q-learning.

Problem 3

We experiment with the learning rate. The graph below shows the average rewards plotted for four different settings of the learning rate for the Lunar Lander game.

It can be seen from the graph that a high learning rate (0.1) decreases the average rewards significantly. As the learning rate is lowered from 0.1 to 0.001 the average rewards increase. However, further lowering the learning rate to 0.0001 decreases the average rewards.

Actor-Critic Algorithm

Problem 1

Problem 2: Inverted Pendulum

Problem 2: Half Cheetah