no code implementations • 19 Mar 2024 • Rui Liu, Amisha Bhaskar, Pratap Tokekar
Notably, our model, trained solely on data from a transparent glass bowl containing granular cereals, showcases generalization ability when tested zero-shot on other bowl configurations with different types of food.
no code implementations • 22 Dec 2023 • Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi
Current methods to mitigate this misalignment work by learning reward functions from human preferences; however, they inadvertently introduce a risk of reward overoptimization.