Homework 4
Note: To replicate the results, run the script files in this repository.
Problem 1
Problem 2
This is the log file:
09-29 14:47:21 HalfCheetah_q2_HalfCheetah_q2_default INFO Gathering random dataset
09-29 14:47:21 HalfCheetah_q2_HalfCheetah_q2_default INFO Creating policy
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO Random policy
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO --------- ---------
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnAvg -152.871
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMax -133.281
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMin -201.981
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnStd 19.7676
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO --------- ---------
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : total 0.0 (100.0%)
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : other 0.0 (100.0%)
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG
09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO Training policy....
09-29 14:47:24 HalfCheetah_q2_HalfCheetah_q2_default INFO Evaluating policy...
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO Trained policy
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ----------------- ----------
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnAvg -13.3144
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMax 30.6608
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMin -50.3141
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnStd 22.9741
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO TrainingLossFinal 0.119266
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO TrainingLossStart 4.68183
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ----------------- ----------
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : total 62.7 (100.0%)
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : get action 60.4 (96.2%)
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : train policy 1.1 (1.8%)
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : env step 1.0 (1.6%)
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : other 0.2 (0.4%)
09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG
Problem 3a
Problem 3b