no code implementations • 18 Apr 2024 • Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn
Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm.
no code implementations • 28 Mar 2024 • Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn
A number of approaches have been developed to control those biases in the classical RLHF literature, but the problem remains relatively under-explored for Direct Alignment Algorithms such as Direct Preference Optimization (DPO).
1 code implementation • 18 Oct 2023 • Ryan Park, Ryan Theisen, Navriti Sahni, Marcel Patek, Anna Cichońska, Rayees Rahman
Molecular language modeling is an effective approach to generating novel chemical structures.
no code implementations • 10 Jul 2023 • Jesse Choe, Siddhant Sood, Ryan Park
EchoVest also provides various features, including sound localization, sound classification, noise reduction, and depth perception.