Search Results for author: Ryan Park

Found 4 papers, 1 papers with code

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

no code implementations18 Apr 2024 Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm.

Language Modelling Q-Learning +1

Disentangling Length from Quality in Direct Preference Optimization

no code implementations28 Mar 2024 Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn

A number of approaches have been developed to control those biases in the classical RLHF literature, but the problem remains relatively under-explored for Direct Alignment Algorithms such as Direct Preference Optimization (DPO).

reinforcement-learning

Preference Optimization for Molecular Language Models

1 code implementation18 Oct 2023 Ryan Park, Ryan Theisen, Navriti Sahni, Marcel Patek, Anna Cichońska, Rayees Rahman

Molecular language modeling is an effective approach to generating novel chemical structures.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.