no code implementations • 24 Jan 2024 • Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences.
no code implementations • 30 Nov 2023 • Jing Nathan Yan, Jiatao Gu, Alexander M. Rush
In recent advancements in high-fidelity image generation, Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a key player.
no code implementations • 14 Nov 2023 • Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky
Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning.
no code implementations • 13 Nov 2023 • Yue Yu, Jiaming Shen, Tianqi Liu, Zhen Qin, Jing Nathan Yan, Jialu Liu, Chao Zhang, Michael Bendersky
To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
1 code implementation • 1 Jun 2023 • Wang-Chiew Tan, Jane Dwivedi-Yu, Yuliang Li, Lambert Mathias, Marzieh Saeidi, Jing Nathan Yan, Alon Y. Halevy
We describe a set of experiments on TimelineQA with several state-of-the-art QA models.
1 code implementation • 20 Dec 2022 • Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush
Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.