no code implementations • 26 Feb 2024 • Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li
We augment the ground-truth solutions of our seed data and train a back-translation model to translate the augmented solutions back into new questions.
no code implementations • 22 Feb 2024 • Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Mingjie Zhan, Hongsheng Li
Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista.
Ranked #1 on Multimodal Reasoning on MATH-V (using extra training data)
2 code implementations • 11 Jan 2024 • Zhaowei Li, Qi Xu, Dong Zhang, Hang Song, Yiqing Cai, Qi Qi, Ran Zhou, Junting Pan, Zefeng Li, Van Tu Vu, Zhida Huang, Tao Wang
Beyond capturing global information like other multi-modal models, our proposed model excels at tasks demanding a detailed understanding of local information within the input.
no code implementations • 29 Oct 2023 • Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan
Based on TeacherLM-7. 1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting.
no code implementations • 25 Jul 2023 • Alireza Shafizadeh, Hossein Shahbeik, Mohammad Hossein Nadian, Vijai Kumar Gupta, Abdul-Sattar Nizami, Su Shiung Lam, WanXi Peng, Junting Pan, Meisam Tabatabaei, Mortaza Aghbashlo
Literature is used to compile a database covering a variety of catalyst characteristics and reaction conditions.
no code implementations • 15 Jun 2023 • Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li
Video Question Answering (VideoQA) has been significantly advanced from the scaling of recent Large Language Models (LLMs).
Ranked #3 on Temporal/Casual QA on NExT-QA (using extra training data)
no code implementations • 24 May 2023 • Hossein Shahbeik, Alireza Shafizadeh, Mohammad Hossein Nadian, Dorsa Jeddi, Seyedali Mirjalili, Yadong Yang, Su Shiung Lam, Junting Pan, Meisam Tabatabaei, Mortaza Aghbashlo
The input features are constructed using an innovative approach to reflect the physics of the process.
1 code implementation • 22 May 2023 • Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang
Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.
1 code implementation • 4 May 2023 • Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li
Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.
Ranked #1 on Personalized Segmentation on PerSeg
no code implementations • ICCV 2023 • Aojun Zhou, Yang Li, Zipeng Qin, Jianbo Liu, Junting Pan, Renrui Zhang, Rui Zhao, Peng Gao, Hongsheng Li
In this paper, we aim to reduce model complexity from large vision transformers pretrained by MAE with assistant of sparse training.
2 code implementations • 6 Dec 2022 • Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao
Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.
Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)
2 code implementations • 17 Nov 2022 • Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao
In this report, we present our champion solutions to five tracks at Ego4D challenge.
Ranked #1 on State Change Object Detection on Ego4D
1 code implementation • 27 Jun 2022 • Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li
This has led to a new research direction in parameter-efficient transfer learning.
Ranked #23 on Action Recognition on Something-Something V2 (using extra training data)
1 code implementation • 6 May 2022 • Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez
In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency.
2 code implementations • 16 Jun 2020 • Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu
This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.
3 code implementations • CVPR 2021 • Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li
We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.
Ranked #2 on Action Recognition on AVA v2.1
2 code implementations • CVPR 2019 • Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.
no code implementations • 3 Mar 2019 • Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy
Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.
no code implementations • ECCV 2018 • Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, Shih-Fu Chang
We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos.
3 code implementations • 4 Jan 2017 • Junting Pan, Cristian Canton Ferrer, Kevin McGuinness, Noel E. O'Connor, Jordi Torres, Elisa Sayrol, Xavier Giro-i-Nieto
We introduce SalGAN, a deep convolutional neural network for visual saliency prediction trained with adversarial examples.
1 code implementation • CVPR 2016 • Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel O'Connor, Xavier Giro-i-Nieto
The prediction of salient areas in images has been traditionally addressed with hand-crafted features based on neuroscience principles.
1 code implementation • 6 Jul 2015 • Junting Pan, Xavier Giró-i-Nieto
The prediction of saliency areas in images has been traditionally addressed with hand crafted features based on neuroscience principles.