Search Results for author: Shimon Ullman

Found 33 papers, 8 papers with code

Towards Multimodal In-Context Learning for Vision & Language Models

no code implementations • 19 Mar 2024 • Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky

Inspired by the emergence of Large Language Models (LLMs) that can truly understand human language, significant progress has been made in aligning other, non-language, modalities to be `understandable' by an LLM, primarily via converting their samples into a sequence of embedded language-like tokens directly fed into the LLM (decoder) input stream.

In-Context Learning

Paper
Add Code

Efficient Rehearsal Free Zero Forgetting Continual Learning using Adaptive Weight Modulation

no code implementations • 26 Nov 2023 • Yonatan Sverdlov, Shimon Ullman

This challenge arises due to the tendency of previously learned weights to be adjusted to suit the objectives of new tasks, resulting in a phenomenon called catastrophic forgetting.

Continual Learning

Paper
Add Code

Top-Down Network Combines Back-Propagation with Attention

1 code implementation • 4 Jun 2023 • Roy Abel, Shimon Ullman

For example, during multi-task learning, the same top-down network is being used for both learning, via propagating feedback signals, and at the same time also for top-down attention, by guiding the bottom-up network to perform a selected task.

Multi-Task Learning

Paper
Code

Teaching Structured Vision & Language Concepts to Vision & Language Models

1 code implementation • CVPR 2023 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Eli Schwartz, Roei Herzig, Raja Giryes, Rogerio Feris, Rameswar Panda, Shimon Ullman, Leonid Karlinsky

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.

Paper
Code

Teaching Structured Vision&Language Concepts to Vision&Language Models

1 code implementation • 21 Nov 2022 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Rameswar Panda, Roei Herzig, Eli Schwartz, Donghyun Kim, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.

Paper
Code

A model for full local image interpretation

no code implementations • 17 Oct 2021 • Guy Ben-Yosef, Liav Assif, Daniel Harari, Shimon Ullman

We describe a computational model of humans' ability to provide a detailed interpretation of components in a scene.

Paper
Add Code

Image interpretation by iterative bottom-up top-down processing

1 code implementation • 12 May 2021 • Shimon Ullman, Liav Assif, Alona Strugatski, Ben-Zion Vatashsky, Hila Levy, Aviv Netanyahu, Adam Yaari

Scene understanding requires the extraction and representation of scene components together with their properties and inter-relations.

Scene Understanding

Paper
Code

Detector-Free Weakly Supervised Grounding by Separation

1 code implementation • ICCV 2021 • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.

Ranked #1 on Phrase Grounding on Visual Genome

Phrase Grounding

Paper
Code

What can human minimal videos tell us about dynamic recognition models?

1 code implementation • 19 Apr 2021 • Guy Ben-Yosef, Gabriel Kreiman, Shimon Ullman

In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood.

Paper
Code

What takes the brain so long: Object recognition at the level of minimal images develops for up to seconds of presentation time

no code implementations • 9 Jun 2020 • Hanna Benoni, Daniel Harari, Shimon Ullman

Subjects were assigned to one of nine exposure conditions: 200, 500, 1000, 2000 ms with or without masking, as well as unlimited time.

Decision Making Object Recognition

Paper
Add Code

Multi-Task Learning by a Top-Down Control Network

no code implementations • 9 Feb 2020 • Hila Levi, Shimon Ullman

As the range of tasks performed by a general vision system expands, executing multiple tasks accurately and efficiently in a single network has become an important and still open problem.

Multi-Task Learning

Paper
Add Code

Task-Based Top-Down Modulation Network for Multi-Task-Learning Applications

no code implementations • 25 Sep 2019 • Hila Levi, Shimon Ullman

Recent approaches address this problem by a channel-wise modulation of the feature-maps along the shared backbone, with task specific vectors, manually or dynamically tuned.

Multi-Task Learning

Paper
Add Code

The Cakewalk Method

no code implementations • ICLR 2019 • Uri Patish, Shimon Ullman

Notably, we show in this benchmark that fixing the distribution of the surrogate is key to consistently recovering locally optimal solutions, and that our surrogate objective leads to an algorithm that outperforms other methods we have tested in a number of measures.

Combinatorial Optimization

Paper
Add Code

Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects

no code implementations • 29 Nov 2018 • Hila Levi, Shimon Ullman

An image is not just a collection of objects, but rather a graph where each object is related to other objects through spatial and semantic relations.

object-detection Object Detection +1

Paper
Add Code

VQA with no questions-answers training

1 code implementation • CVPR 2020 • Ben-Zion Vatashsky, Shimon Ullman

Methods for teaching machines to answer visual questions have made significant progress in recent years, but current methods still lack important human capabilities, including integrating new visual classes and concepts in a modular manner, providing explanations for the answers and handling new domains without explicit examples.

Visual Question Answering (VQA)

Paper
Code

Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures

no code implementations • 25 Oct 2018 • Ben Zion Vatashsky, Shimon Ullman

An image related question defines a specific visual task that is required in order to produce an appropriate answer.

Paper
Add Code

Discovery and usage of joint attention in images

no code implementations • 10 Apr 2018 • Daniel Harari, Joshua B. Tenenbaum, Shimon Ullman

Second, we use a human study to demonstrate the sensitivity of humans to joint attention, suggesting that the detection of such a configuration in an image can be useful for understanding the image, including the goals of the agents and their joint activity, and therefore can contribute to image captioning and related tasks.

Image Captioning

Paper
Add Code

Large Field and High Resolution: Detecting Needle in Haystack

no code implementations • 10 Apr 2018 • Hadar Gorodissky, Daniel Harari, Shimon Ullman

The growing use of convolutional neural networks (CNN) for a broad range of visual tasks, including tasks involving fine details, raises the problem of applying such networks to a large field of view, since the amount of computations increases significantly with the number of pixels.

Vocal Bursts Intensity Prediction

Paper
Add Code

Cakewalk Sampling

no code implementations • 25 Feb 2018 • Uri Patish, Shimon Ullman

We study the task of finding good local optima in combinatorial optimization problems.

Clustering Combinatorial Optimization +1

Paper
Add Code

A model for interpreting social interactions in local image regions

no code implementations • 26 Dec 2017 • Guy Ben-Yosef, Alon Yachin, Shimon Ullman

Understanding social interactions (such as 'hug' or 'fight') is a basic and important capacity of the human visual system, but a challenging and still open problem for modeling.

Paper
Add Code

Structured learning and detailed interpretation of minimal object images

no code implementations • 29 Nov 2017 • Guy Ben-Yosef, Liav Assif, Shimon Ullman

We model the process of human full interpretation of object images, namely the ability to identify and localize all semantic features and parts that are recognized by human observers.

Paper
Add Code

Measuring and modeling the perception of natural and unconstrained gaze in humans and machines

no code implementations • 29 Nov 2016 • Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman

How accurate are humans in determining the gaze direction of others in lifelike scenes, when they can move their heads and eyes freely, and what are the sources of information for the underlying perceptual processes?

Paper
Add Code

Discovering containment: from infants to machines

no code implementations • 30 Oct 2016 • Shimon Ullman, Nimrod Dorfman, Daniel Harari

Current artificial learning systems can recognize thousands of visual categories, or play Go at a champion"s level, but cannot explain infants learning, in particular the ability to learn complex concepts without guidance, in a specific order.

Paper
Add Code

Action Classification via Concepts and Attributes

no code implementations • 25 May 2016 • Amir Rosenfeld, Shimon Ullman

Classes in natural images tend to follow long tail distributions.

Action Classification Action Recognition +3

Paper
Add Code

Human Pose Estimation using Deep Consensus Voting

no code implementations • 27 Mar 2016 • Ita Lifshitz, Ethan Fetaya, Shimon Ullman

In this paper we consider the problem of human pose estimation from a single still image.

Ranked #37 on Pose Estimation on MPII Human Pose

Pose Estimation Position

Paper
Add Code

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

no code implementations • EMNLP 2015 • Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, Shimon Ullman

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception.

Sentence

Paper
Add Code

Visual Concept Recognition and Localization via Iterative Introspection

no code implementations • 14 Mar 2016 • Amir Rosenfeld, Shimon Ullman

Convolutional neural networks have been shown to develop internal representations, which correspond closely to semantically meaningful objects and parts, although trained solely on class labels.

General Classification

Paper
Add Code

Face-space Action Recognition by Face-Object Interactions

no code implementations • 17 Jan 2016 • Amir Rosenfeld, Shimon Ullman

Action recognition in still images has seen major improvement in recent years due to advances in human pose estimation, object recognition and stronger feature representations.

Action Recognition In Still Images Object +2

Paper
Add Code

Hand-Object Interaction and Precise Localization in Transitive Action Recognition

no code implementations • 12 Nov 2015 • Amir Rosenfeld, Shimon Ullman

In this paper we demonstrate how recognition is improved by obtaining precise localization of the action-object and consequently extracting details of the object shape together with the actor-object interaction.

Action Recognition In Still Images Object +3

Paper
Add Code

Learning Local Invariant Mahalanobis Distances

no code implementations • 4 Feb 2015 • Ethan Fetaya, Shimon Ullman

For many tasks and data types, there are natural transformations to which the data should be invariant or insensitive.

BIG-bench Machine Learning Translation

Paper
Add Code

When Computer Vision Gazes at Cognition

1 code implementation • 8 Dec 2014 • Tao Gao, Daniel Harari, Joshua Tenenbaum, Shimon Ullman

(1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly.

Task 2

Paper
Code

Graph Approximation and Clustering on a Budget

no code implementations • 10 Jun 2014 • Ethan Fetaya, Ohad Shamir, Shimon Ullman

We consider the problem of learning from a similarity matrix (such as spectral clustering and lowd imensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed.

Clustering

Paper
Add Code

Using body-anchored priors for identifying actions in single images

no code implementations • NeurIPS 2010 • Leonid Karlinsky, Michael Dinerstein, Shimon Ullman

The task is easy for humans but difficult for current approaches to object recognition, because action instances may be similar in terms of body pose, and often require detailed examination of relations between participating objects and body parts in order to be recognized.

Object Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.