Search Results for author: Assaf Arbelle

Found 18 papers, 11 papers with code

NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

no code implementations • 30 Mar 2024 • Eli Schwartz, Leshem Choshen, Joseph Shtok, Sivan Doveh, Leonid Karlinsky, Assaf Arbelle

Language models struggle with handling numerical data and performing arithmetic operations.

Language Modelling

Paper
Add Code

Towards Multimodal In-Context Learning for Vision & Language Models

no code implementations • 19 Mar 2024 • Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky

Inspired by the emergence of Large Language Models (LLMs) that can truly understand human language, significant progress has been made in aligning other, non-language, modalities to be `understandable' by an LLM, primarily via converting their samples into a sequence of embedded language-like tokens directly fed into the LLM (decoder) input stream.

In-Context Learning

Paper
Add Code

Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs

no code implementations • 10 May 2023 • Roei Herzig, Alon Mendelson, Leonid Karlinsky, Assaf Arbelle, Rogerio Feris, Trevor Darrell, Amir Globerson

For the visual side, we incorporate a special "SG Component" in the image transformer trained to predict SG information, while for the textual side, we utilize SGs to generate fine-grained captions that highlight different compositional aspects of the scene.

Ranked #24 on Visual Reasoning on Winoground

Scene Understanding Visual Reasoning

Paper
Add Code

Teaching Structured Vision & Language Concepts to Vision & Language Models

1 code implementation • CVPR 2023 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Eli Schwartz, Roei Herzig, Raja Giryes, Rogerio Feris, Rameswar Panda, Shimon Ullman, Leonid Karlinsky

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.

Paper
Code

PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data

no code implementations • 8 Dec 2022 • Roei Herzig, Ofir Abramovich, Elad Ben-Avraham, Assaf Arbelle, Leonid Karlinsky, Ariel Shamir, Trevor Darrell, Amir Globerson

In this work, we propose an approach to leverage synthetic scene data for improving video understanding.

Action Recognition Video Understanding

Paper
Add Code

MAEDAY: MAE for few and zero shot AnomalY-Detection

1 code implementation • 25 Nov 2022 • Eli Schwartz, Assaf Arbelle, Leonid Karlinsky, Sivan Harary, Florian Scheidegger, Sivan Doveh, Raja Giryes

We propose using Masked Auto-Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD).

Anomaly Detection Image Inpainting +4

Paper
Code

CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning

1 code implementation • CVPR 2023 • James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira

Our experiments show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4. 5% in average final accuracy.

Continual Learning Novel Concepts

109

Paper
Code

Teaching Structured Vision&Language Concepts to Vision&Language Models

1 code implementation • 21 Nov 2022 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Rameswar Panda, Roei Herzig, Eli Schwartz, Donghyun Kim, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.

Paper
Code

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

1 code implementation • CVPR 2023 • James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting.

Paper
Code

FETA: Towards Specializing Foundation Models for Expert Task Applications

1 code implementation • 8 Sep 2022 • Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e. g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training.

Ranked #1 on Image-to-Text Retrieval on FETA Car-Manuals

Domain Generalization Image Retrieval +6

Paper
Code

Unsupervised Domain Generalization by Learning a Bridge Across Domains

1 code implementation • CVPR 2022 • Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky

The ability to generalize learned representations across significantly different visual domains, such as between real photos, clipart, paintings, and sketches, is a fundamental capacity of the human visual system.

Domain Generalization Self-Supervised Learning

Paper
Code

CHARTER: heatmap-based multi-type chart data extraction

no code implementations • 28 Nov 2021 • Joseph Shtok, Sivan Harary, Ophir Azulai, Adi Raz Goldfarb, Assaf Arbelle, Leonid Karlinsky

The digital conversion of information stored in documents is a great source of knowledge.

Vocal Bursts Type Prediction

Paper
Add Code

Detector-Free Weakly Supervised Grounding by Separation

1 code implementation • ICCV 2021 • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.

Ranked #1 on Phrase Grounding on Visual Genome

Phrase Grounding

Paper
Code

DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

1 code implementation • 6 May 2020 • Mor Avi-Aharon, Assaf Arbelle, Tammy Riklin Raviv

Promising results are shown for the tasks of color transfer, image colorization and edges $\rightarrow$ photo, where the color distribution of the output image is controlled.

Colorization Image Colorization +2

Paper
Code

Hue-Net: Intensity-based Image-to-Image Translation with Differentiable Histogram Loss Functions

no code implementations • 12 Dec 2019 • Mor Avi-Aharon, Assaf Arbelle, Tammy Riklin Raviv

To enforce color-free similarity between the source and the output images, we define a semantic-based loss by a differentiable approximation of the MI of these images.

Image-to-Image Translation Translation