Search Results for author: Sebastian Goodman

Found 13 papers, 7 papers with code

PaLI-3 Vision Language Models: Smaller, Faster, Stronger

1 code implementation • 13 Oct 2023 • Xi Chen, Xiao Wang, Lucas Beyer, Alexander Kolesnikov, Jialin Wu, Paul Voigtlaender, Basil Mustafa, Sebastian Goodman, Ibrahim Alabdulmohsin, Piotr Padlewski, Daniel Salz, Xi Xiong, Daniel Vlasic, Filip Pavetic, Keran Rong, Tianli Yu, Daniel Keysers, Xiaohua Zhai, Radu Soricut

This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger.

Ranked #2 on Temporal/Casual QA on NExT-QA (using extra training data)

Chart Question Answering Image Classification +4

118

Paper
Code

CausalLM is not optimal for in-context learning

1 code implementation • 14 Aug 2023 • Nan Ding, Tomer Levinboim, Jialin Wu, Sebastian Goodman, Radu Soricut

Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples.

In-Context Learning Language Modelling

Paper
Code

PaLI-X: On Scaling up a Multilingual Vision and Language Model

2 code implementations • 29 May 2023 • Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture.

Ranked #1 on Fine-Grained Image Recognition on OVEN

Chart Question Answering document understanding +9

Paper
Code

PaLI: A Jointly-Scaled Multilingual Language-Image Model

1 code implementation • 14 Sep 2022 • Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages.

Ranked #1 on Zero-Shot Transfer Image Classification on ImageNet-S

Decoder Few-Shot Image Classification +6

1,772

Paper
Code

PreSTU: Pre-Training for Scene-Text Understanding

no code implementations • ICCV 2023 • Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut

The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability in their training objective.

Decoder Image Captioning +3

Paper
Add Code

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

3 code implementations • 31 Mar 2022 • Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen, Kathleen Kenealy, Jonathan H. Clark, Stephan Lee, Dan Garrette, James Lee-Thorp, Colin Raffel, Noam Shazeer, Marvin Ritter, Maarten Bosma, Alexandre Passos, Jeremy Maitin-Shepard, Noah Fiedel, Mark Omernick, Brennan Saeta, Ryan Sepassi, Alexander Spiridonov, Joshua Newlan, Andrea Gesmundo

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves.

Decoder

2,529

Paper
Code

Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning

no code implementations • NeurIPS 2021 • Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut

Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited.

Few-Shot Learning

Paper
Add Code

TeaForN: Teacher-Forcing with N-grams

no code implementations • EMNLP 2020 • Sebastian Goodman, Nan Ding, Radu Soricut

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps.

Decoder Machine Translation +2

Paper
Add Code

Multi-Image Summarization: Textual Summary from a Set of Cohesive Images

no code implementations • 15 Jun 2020 • Nicholas Trieu, Sebastian Goodman, Pradyumna Narayana, Kazoo Sone, Radu Soricut

Multi-sentence summarization is a well studied problem in NLP, while generating image descriptions for a single image is a well studied problem in Computer Vision.

Descriptive Image Captioning +2

Paper
Add Code

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

48 code implementations • ICLR 2020 • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

Ranked #1 on Natural Language Inference on QNLI

Common Sense Reasoning Linguistic Acceptability +7

126,436

Paper
Code

Multi-stage Pretraining for Abstractive Summarization

no code implementations • 23 Sep 2019 • Sebastian Goodman, Zhenzhong Lan, Radu Soricut

Neural models for abstractive summarization tend to achieve the best performance in the presence of highly specialized, summarization specific modeling add-ons such as pointer-generator, coverage-modeling, and inferencetime heuristics.

Abstractive Text Summarization

Paper
Add Code

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

2 code implementations • ACL 2018 • Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut

We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles.

Image Captioning

491

Paper
Code

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

no code implementations • 22 Dec 2016 • Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options.

Image Captioning Multi-Task Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.