Homework 1
Note: The code can be found in this repository.
Problem 2.2
For both experiments:
- Hidden layer = 128 units
- Training Data Size = 100 rollouts = 100,000 timesteps
- Epochs = 500
- Test Data Size = 20 rollouts
Hopper
Expert Policy | Trained Model | |
---|---|---|
**Mean** | 3778.79126 | 3779.27708 |
**Standard Deviation** | 3.03886 | 3.07484 |
Humanoid
Expert Policy | Trained Model | |
---|---|---|
**Mean** | 10306.80848 | 407.63263 |
**Standard Deviation** | 979.44124 | 23.56048 |
Problem 2.3
The data size was varied and the rewards observed:
As the data size increases, the mean reward initially increases up to a certain point and then remains constant. Rationale: The size of data is one of the most important parameter of a model. If the data is not enough, the model will never be able to generalize well to the entire distribution from which the data is being drawn.