1 code implementation • 18 Oct 2023 • Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness
Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths.
1 code implementation • 20 Sep 2023 • Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness
BTLM-3B-8K is available under an Apache 2. 0 license on Hugging Face: https://huggingface. co/cerebras/btlm-3b-8k-base.
no code implementations • 19 Sep 2023 • Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing
This paper aims to understand the impacts of various data combinations (e. g., web text, wikipedia, github, books) on the training of large language models using SlimPajama.
no code implementations • 20 Oct 2020 • Daria Soboleva, Ondrej Skopek, Márius Šajgalík, Victor Cărbune, Felix Weissenberger, Julia Proskurnia, Bogdan Prisacari, Daniel Valcarce, Justin Lu, Rohit Prabhavalkar, Balint Miklos
We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1