no code implementations • 20 May 2019 • Xingyu Lin, Harjatin Singh Baweja, David Held
However, if this policy is trained with reinforcement learning, then without a state estimator, it is hard to specify a reward function based on high-dimensional observations.