Search Results for author: David Yao

Found 2 papers, 0 papers with code

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

no code implementations23 May 2024 Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang

Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM).

Decision Tree Algorithms for MDP

no code implementations29 Sep 2021 Elioth Sanabria, David Yao, Henry Lam

In this paper, we show that even for problems with large state space, when the solution policy of the MDP can be represented by a tree-like structure, our proposed algorithm retrieves a tree of the solution policy of the MDP in computationally tractable time.

Cannot find the paper you are looking for? You can Submit a new open access paper.