1 code implementation • 3 Sep 2023 • Xuyang Liu, Siteng Huang, Yachen Kang, Honggang Chen, Donglin Wang
Large-scale text-to-image diffusion models have shown impressive capabilities for generative tasks by leveraging strong vision-language alignment from pre-training.
1 code implementation • 19 Jul 2023 • Yachen Kang, Li He, Jinxin Liu, Zifeng Zhuang, Donglin Wang
Due to the existence of similarity trap, such consistency regularization improperly enhances the consistency possiblity of the model's predictions between segment pairs, and thus reduces the confidence in reward learning, since the augmented distribution does not match with the original one in PbRL.
no code implementations • NeurIPS 2023 • Jinxin Liu, Hongyin Zhang, Zifeng Zhuang, Yachen Kang, Donglin Wang, Bin Wang
Naturally, such a paradigm raises three core questions that are not fully answered by prior non-iterative offline RL counterparts like reward-conditioned policy: (q1) What information should we transfer from the inner-level to the outer-level?
1 code implementation • 22 Jun 2023 • Jinxin Liu, Ziqi Zhang, Zhenyu Wei, Zifeng Zhuang, Yachen Kang, Sibo Gai, Donglin Wang
Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed data.
1 code implementation • 25 May 2023 • Yachen Kang, Diyuan Shi, Jinxin Liu, Li He, Donglin Wang
Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectively.
no code implementations • NeurIPS 2021 • Jinxin Liu, Hao Shen, Donglin Wang, Yachen Kang, Qiangxing Tian
Unsupervised reinforcement learning aims to acquire skills without prior goal representations, where an agent automatically explores an open-ended environment to represent goals and learn the goal-conditioned policy.
no code implementations • 21 Oct 2021 • Yachen Kang, Jinxin Liu, Xin Cao, Donglin Wang
To achieve this, the widely used GAN-inspired IRL method is adopted, and its discriminator, recognizing policy-generating trajectories, is modified with the quantification of dynamics difference.
1 code implementation • 10 Sep 2020 • Siteng Huang, Min Zhang, Yachen Kang, Donglin Wang
However, these approaches only augment the representations of samples with available semantics while ignoring the query set, which loses the potential for the improvement and may lead to a shift between the modalities combination and the pure-visual representation.