1 code implementation • 21 Feb 2024 • Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul McVay, Michael Rabbat, Yuandong Tian
We fine tune this model to obtain a Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93. 7% of the time, while using up to 26. 8% fewer search steps than the $A^*$ implementation that was used for training initially.
no code implementations • 5 Feb 2024 • Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng
We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently.
no code implementations • 22 Nov 2023 • Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, Ricky T. Q. Chen
Classifier-free guidance is a key component for enhancing the performance of conditional generative models across diverse tasks.
1 code implementation • 16 Feb 2023 • Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum
For offline RL, our analysis frames a recent offline RL method XQL in the dual framework, and we further propose a new method f-DVL that provides alternative choices to the Gumbel regression loss that fixes the known training instability issue of XQL.
1 code implementation • 12 Oct 2022 • Qinqing Zheng, Mikael Henaff, Brandon Amos, Aditya Grover
For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories.
1 code implementation • 11 Oct 2022 • Tung Nguyen, Qinqing Zheng, Aditya Grover
We study CWBC in the context of RvS (Emmons et al., 2021) and Decision Transformers (Chen et al., 2021), and show that CWBC significantly boosts their performance on various benchmarks.
1 code implementation • 3 Oct 2022 • Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy Zhang, Ricky T. Q. Chen
While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity.
2 code implementations • 11 Feb 2022 • Qinqing Zheng, Amy Zhang, Aditya Grover
Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling.
no code implementations • 2 Mar 2021 • Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J. Su
A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often come from different but not entirely unrelated distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective.
1 code implementation • 22 Feb 2021 • Qinqing Zheng, Shuxiao Chen, Qi Long, Weijie J. Su
Federated learning (FL) is a training paradigm where the clients collaboratively learn models by repeatedly sharing information without compromising much on the privacy of their local sensitive data.
1 code implementation • 9 Jun 2020 • Arun Kumar Kuchibhotla, Qinqing Zheng
Many inference problems, such as sequential decision problems like A/B testing, adaptive sampling schemes like bandit selection, are often online in nature.
1 code implementation • ICML 2020 • Qinqing Zheng, Jinshuo Dong, Qi Long, Weijie J. Su
To address this question, we introduce a family of analytical and sharp privacy bounds under composition using the Edgeworth expansion in the framework of the recently proposed f-differential privacy.
no code implementations • 7 Mar 2020 • Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson Azzolini, Qiang Wu, Ou Jin, Shri Karandikar, Hagay Lupesko, Liang Xiong, Eric Zhou
Recommendation systems are often trained with a tremendous amount of data, and distributed training is the workhorse to shorten the training time.
no code implementations • 23 May 2016 • Qinqing Zheng, John Lafferty
We address the rectangular matrix completion problem by lifting the unknown matrix to a positive semidefinite matrix in higher dimension, and optimizing a nonconvex objective over the semidefinite factor using a simple gradient descent scheme.
no code implementations • NeurIPS 2015 • Qinqing Zheng, John Lafferty
We propose a simple, scalable, and fast gradient descent algorithm to optimize a nonconvex objective for the rank minimization problem and a closely related family of semidefinite programs.
1 code implementation • NeurIPS 2015 • Qinqing Zheng, Ryota Tomioka
We consider the problem of recovering a low-rank tensor from its noisy observation.