Note: To replicate the results, run the script files in this repository.

Problem 1

Problem 2

This is the log file:

09-29 14:47:21 HalfCheetah_q2_HalfCheetah_q2_default INFO Gathering random dataset 09-29 14:47:21 HalfCheetah_q2_HalfCheetah_q2_default INFO Creating policy 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO Random policy 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO --------- --------- 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnAvg -152.871 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMax -133.281 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMin -201.981 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnStd 19.7676 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO --------- --------- 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : total 0.0 (100.0%) 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : other 0.0 (100.0%) 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default DEBUG 09-29 14:47:23 HalfCheetah_q2_HalfCheetah_q2_default INFO Training policy.... 09-29 14:47:24 HalfCheetah_q2_HalfCheetah_q2_default INFO Evaluating policy... 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO Trained policy 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ----------------- ---------- 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnAvg -13.3144 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMax 30.6608 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnMin -50.3141 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ReturnStd 22.9741 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO TrainingLossFinal 0.119266 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO TrainingLossStart 4.68183 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default INFO ----------------- ---------- 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : total 62.7 (100.0%) 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : get action 60.4 (96.2%) 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : train policy 1.1 (1.8%) 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : env step 1.0 (1.6%) 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG : other 0.2 (0.4%) 09-29 14:48:26 HalfCheetah_q2_HalfCheetah_q2_default DEBUG

Problem 3a

Problem 3b