2 code implementations • 11 Apr 2024 • Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa
We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA.
1 code implementation • 10 Apr 2024 • Shishir G. Patil, Tianjun Zhang, Vivian Fang, Noppapon C., Roy Huang, Aaron Hao, Martin Casado, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica
We believe this is critical to unlock the potential for LLM agents to interact with applications and services with limited (post-facto) human involvement.
1 code implementation • 15 Mar 2024 • Tianjun Zhang, Shishir G. Patil, Naman jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez
In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings.
no code implementations • 12 Mar 2024 • Naman jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica
Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry.
no code implementations • 8 Feb 2024 • Tianjun Zhang, Aman Madaan, Luyu Gao, Steven Zheng, Swaroop Mishra, Yiming Yang, Niket Tandon, Uri Alon
We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3. 5-turbo, GPT-4, GPT-4 turbo and Claude-2. 1.
no code implementations • 23 Dec 2023 • Jiaxin Ge, Xinyan Chen, Tianjun Zhang, Shanghang Zhang
IP-RLDF first samples a batch of images conditioned on the text, then relabels the text prompts of unmatched text-image pairs with classifier feedback.
no code implementations • 25 Nov 2023 • Naman jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica
In this work, we investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
1 code implementation • 7 Aug 2023 • Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, Jie Tang
We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting.
no code implementations • 29 May 2023 • Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, Xin Wang
Control-GPT works by querying GPT-4 to write TikZ code, and the generated sketches are used as references alongside the text instructions for diffusion models (e. g., ControlNet) to generate photo-realistic images.
1 code implementation • 24 May 2023 • Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez
Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis.
1 code implementation • 10 Feb 2023 • Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez
In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.
no code implementations • 17 Dec 2022 • Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai
Theoretically, we establish the sample complexity of the proposed approach in the online and offline settings.
Model-based Reinforcement Learning reinforcement-learning +1
1 code implementation • 21 Nov 2022 • Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell
Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.
1 code implementation • 21 Nov 2022 • Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez
To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a wide set of commonly-used components like instructions, few-shot exemplars, and verbalizers.
1 code implementation • 22 Aug 2022 • Zhengyao Jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian
Planning-based reinforcement learning has shown strong performance in tasks in discrete and low-dimensional continuous action spaces.
no code implementations • 19 Aug 2022 • Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai
Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality.
no code implementations • 14 Jul 2022 • Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
no code implementations • 15 Jun 2022 • Benjamin Eysenbach, Tianjun Zhang, Ruslan Salakhutdinov, Sergey Levine
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable and instead equip RL algorithms with additional representation learning parts (e. g., auxiliary losses, data augmentation).
1 code implementation • 31 May 2022 • Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette
In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation.
1 code implementation • NeurIPS 2021 • Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian
We analyze NovelD thoroughly in MiniGrid and found that empirically it helps the agent explore the environment more uniformly with a focus on exploring beyond the boundary.
no code implementations • 22 Nov 2021 • Tongzheng Ren, Tianjun Zhang, Csaba Szepesvári, Bo Dai
Representation learning lies at the heart of the empirical success of deep learning for dealing with the curse of dimensionality.
no code implementations • ICLR 2022 • Tianjun Zhang, Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine, Joseph E. Gonzalez
Goal-conditioned reinforcement learning (RL) can solve tasks in a wide range of domains, including navigation and manipulation, but learning to reach distant goals remains a central challenge to the field.
1 code implementation • 7 Oct 2021 • Yiyang Zhao, Linnan Wang, Kevin Yang, Tianjun Zhang, Tian Guo, Yuandong Tian
In this paper, we propose LaMOO, a novel multi-objective optimizer that learns a model from observed samples to partition the search space and then focus on promising regions that are likely to contain a subset of the Pareto frontier.
no code implementations • ICLR 2022 • Yiyang Zhao, Linnan Wang, Kevin Yang, Tianjun Zhang, Tian Guo, Yuandong Tian
In this paper, we propose LaMOO, a novel multi-objective optimizer that learns a model from observed samples to partition the search space and then focus on promising regions that are likely to contain a subset of the Pareto frontier.
2 code implementations • NeurIPS 2021 • Kevin Yang, Tianjun Zhang, Chris Cummins, Brandon Cui, Benoit Steiner, Linnan Wang, Joseph E. Gonzalez, Dan Klein, Yuandong Tian
Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function.
1 code implementation • NeurIPS 2021 • Tianjun Zhang, Paria Rashidinejad, Jiantao Jiao, Yuandong Tian, Joseph Gonzalez, Stuart Russell
As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies.
2 code implementations • 15 Dec 2020 • Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian
In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR.
2 code implementations • 16 Oct 2020 • Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian
In this work, we propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge and supports ad hoc team play.
1 code implementation • EMNLP 2021 • Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica
Recent work learns contextual representations of source code by reconstructing tokens from their context.
Ranked #1 on Method name prediction on CodeSearchNet
1 code implementation • NeurIPS 2019 • Tianjun Zhang, Zhewei Yao, Amir Gholami, Joseph E. Gonzalez, Kurt Keutzer, Michael W. Mahoney, George Biros
It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE).
no code implementations • 10 Jun 2019 • Tianjun Zhang, Zhewei Yao, Amir Gholami, Kurt Keutzer, Joseph Gonzalez, George Biros, Michael Mahoney
It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE).
1 code implementation • 21 Nov 2018 • Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer
DiracDeltaNet achieves competitive accuracy on ImageNet (88. 7\% top-5), but with 42$\times$ fewer parameters and 48$\times$ fewer OPs than VGG16.