Note: The code can be found in this repository.

Problem 2.2

For both experiments:

  1. Hidden layer = 128 units
  2. Training Data Size = 100 rollouts = 100,000 timesteps
  3. Epochs = 500
  4. Test Data Size = 20 rollouts

Hopper

Expert Policy Trained Model
**Mean** 3778.79126 3779.27708
**Standard Deviation** 3.03886 3.07484

Humanoid

Expert Policy Trained Model
**Mean** 10306.80848 407.63263
**Standard Deviation** 979.44124 23.56048

Problem 2.3

The data size was varied and the rewards observed:

p2c

As the data size increases, the mean reward initially increases up to a certain point and then remains constant. Rationale: The size of data is one of the most important parameter of a model. If the data is not enough, the model will never be able to generalize well to the entire distribution from which the data is being drawn.