Search Results for author: Bryan Seybold

Found 10 papers, 2 papers with code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #3 on Text-to-Video Generation on MSR-VTT

Decoder Language Modelling +3

Paper
Add Code

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

no code implementations • 20 Dec 2022 • Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross

Detecting actions in untrimmed videos should not be limited to a small, closed set of classes.

Action Detection Optical Flow Estimation

Paper
Add Code

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

1 code implementation • 12 May 2022 • David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

Video Description

Paper
Code

Learning Audio-Video Modalities from Image Captions

no code implementations • 1 Apr 2022 • Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia Schmid

To close this gap we propose a new video mining pipeline which involves transferring captions from image captioning datasets to video clips with no additional manual effort.

Ranked #6 on Zero-shot Text to Audio Retrieval on AudioCaps

Image Captioning Retrieval +4

Paper
Add Code

Optical Mouse: 3D Mouse Pose From Single-View Video

no code implementations • 17 Jun 2021 • Bo Hu, Bryan Seybold, Shan Yang, David Ross, Avneesh Sud, Graham Ruby, Yi Liu

We present a method to infer the 3D pose of mice, including the limbs and feet, from monocular videos.

Paper
Add Code

Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces

no code implementations • 17 May 2019 • Bryan Seybold, Emily Fertig, Alex Alemi, Ian Fischer

Variational autoencoders learn unsupervised data representations, but these models frequently converge to minima that fail to preserve meaningful semantic information.

Decoder