no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot
Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.
no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot
We term this approach Nash learning from human feedback (NLHF).
no code implementations • 1 May 2023 • Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa
Representation learning and exploration are among the key challenges for any deep reinforcement learning agent.
no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko
We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.
no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.
no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos
Exploration is essential for solving complex Reinforcement Learning (RL) tasks.
31 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko
From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.
Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k
no code implementations • 18 Jun 2019 • Zhaohan Daniel Guo, Emma Brunskill
Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general.
no code implementations • 15 Nov 2018 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos
In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.
no code implementations • 9 Mar 2017 • Zhaohan Daniel Guo, Emma Brunskill
This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.
no code implementations • NeurIPS 2017 • Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill
In addition, we can take advantage of special cases that arise due to options-based policies to further improve the performance of importance sampling.
no code implementations • 25 May 2016 • Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill
Many interesting real world domains involve reinforcement learning (RL) in partially observable environments.