Search Results for author: David Yao

Found 2 papers, 0 papers with code

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

no code implementations • 23 May 2024 • Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang

Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM).

Paper
Add Code

Decision Tree Algorithms for MDP

no code implementations • 29 Sep 2021 • Elioth Sanabria, David Yao, Henry Lam

In this paper, we show that even for problems with large state space, when the solution policy of the MDP can be represented by a tree-like structure, our proposed algorithm retrieves a tree of the solution policy of the MDP in computationally tractable time.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.