Homework 1

Note: The code can be found in this repository.

Problem 2.2

For both experiments:

Hidden layer = 128 units
Training Data Size = 100 rollouts = 100,000 timesteps
Epochs = 500
Test Data Size = 20 rollouts

Hopper

	Expert Policy	Trained Model
Mean	3778.79126	3779.27708
Standard Deviation	3.03886	3.07484

Humanoid

	Expert Policy	Trained Model
Mean	10306.80848	407.63263
Standard Deviation	979.44124	23.56048

Problem 2.3

The data size was varied and the rewards observed:

p2c

As the data size increases, the mean reward initially increases up to a certain point and then remains constant. Rationale: The size of data is one of the most important parameter of a model. If the data is not enough, the model will never be able to generalize well to the entire distribution from which the data is being drawn.