1 code implementation • 4 Apr 2024 • Lars Ankile, Anthony Simeonov, Idan Shenfeld, Pulkit Agrawal
While learning from demonstrations is powerful for acquiring visuomotor policies, high-performance imitation without large demonstration datasets remains challenging for tasks requiring precise, long-horizon manipulation.
1 code implementation • 29 Feb 2024 • Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James Glass, Akash Srivastava, Pulkit Agrawal
To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i. e., test cases) that elicit undesirable responses from LLMs.
no code implementations • 6 Jul 2023 • Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal
To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives.
1 code implementation • NeurIPS 2021 • Ron Dorfman, Idan Shenfeld, Aviv Tamar
Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.
1 code implementation • NeurIPS 2021 • Ron Dorfman, Idan Shenfeld, Aviv Tamar
Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of $N$ conventional RL agents, trained on $N$ different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution.