no code implementations • 19 Mar 2024 • Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Amit Alfassy, Assaf Arbelle, Shimon Ullman, Leonid Karlinsky
Inspired by the emergence of Large Language Models (LLMs) that can truly understand human language, significant progress has been made in aligning other, non-language, modalities to be `understandable' by an LLM, primarily via converting their samples into a sequence of embedded language-like tokens directly fed into the LLM (decoder) input stream.
no code implementations • 26 Nov 2023 • Yonatan Sverdlov, Shimon Ullman
This challenge arises due to the tendency of previously learned weights to be adjusted to suit the objectives of new tasks, resulting in a phenomenon called catastrophic forgetting.
1 code implementation • 4 Jun 2023 • Roy Abel, Shimon Ullman
For example, during multi-task learning, the same top-down network is being used for both learning, via propagating feedback signals, and at the same time also for top-down attention, by guiding the bottom-up network to perform a selected task.
1 code implementation • CVPR 2023 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Eli Schwartz, Roei Herzig, Raja Giryes, Rogerio Feris, Rameswar Panda, Shimon Ullman, Leonid Karlinsky
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.
1 code implementation • 21 Nov 2022 • Sivan Doveh, Assaf Arbelle, Sivan Harary, Rameswar Panda, Roei Herzig, Eli Schwartz, Donghyun Kim, Raja Giryes, Rogerio Feris, Shimon Ullman, Leonid Karlinsky
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks.
no code implementations • 17 Oct 2021 • Guy Ben-Yosef, Liav Assif, Daniel Harari, Shimon Ullman
We describe a computational model of humans' ability to provide a detailed interpretation of components in a scene.
1 code implementation • 12 May 2021 • Shimon Ullman, Liav Assif, Alona Strugatski, Ben-Zion Vatashsky, Hila Levy, Aviv Netanyahu, Adam Yaari
Scene understanding requires the extraction and representation of scene components together with their properties and inter-relations.
1 code implementation • ICCV 2021 • Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky
In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector.
Ranked #1 on Phrase Grounding on Visual Genome
1 code implementation • 19 Apr 2021 • Guy Ben-Yosef, Gabriel Kreiman, Shimon Ullman
In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information but the mechanisms integrating space and time are poorly understood.
no code implementations • 9 Jun 2020 • Hanna Benoni, Daniel Harari, Shimon Ullman
Subjects were assigned to one of nine exposure conditions: 200, 500, 1000, 2000 ms with or without masking, as well as unlimited time.
no code implementations • 9 Feb 2020 • Hila Levi, Shimon Ullman
As the range of tasks performed by a general vision system expands, executing multiple tasks accurately and efficiently in a single network has become an important and still open problem.
no code implementations • 25 Sep 2019 • Hila Levi, Shimon Ullman
Recent approaches address this problem by a channel-wise modulation of the feature-maps along the shared backbone, with task specific vectors, manually or dynamically tuned.
no code implementations • ICLR 2019 • Uri Patish, Shimon Ullman
Notably, we show in this benchmark that fixing the distribution of the surrogate is key to consistently recovering locally optimal solutions, and that our surrogate objective leads to an algorithm that outperforms other methods we have tested in a number of measures.
no code implementations • 29 Nov 2018 • Hila Levi, Shimon Ullman
An image is not just a collection of objects, but rather a graph where each object is related to other objects through spatial and semantic relations.
1 code implementation • CVPR 2020 • Ben-Zion Vatashsky, Shimon Ullman
Methods for teaching machines to answer visual questions have made significant progress in recent years, but current methods still lack important human capabilities, including integrating new visual classes and concepts in a modular manner, providing explanations for the answers and handling new domains without explicit examples.
no code implementations • 25 Oct 2018 • Ben Zion Vatashsky, Shimon Ullman
An image related question defines a specific visual task that is required in order to produce an appropriate answer.
no code implementations • 10 Apr 2018 • Daniel Harari, Joshua B. Tenenbaum, Shimon Ullman
Second, we use a human study to demonstrate the sensitivity of humans to joint attention, suggesting that the detection of such a configuration in an image can be useful for understanding the image, including the goals of the agents and their joint activity, and therefore can contribute to image captioning and related tasks.
no code implementations • 10 Apr 2018 • Hadar Gorodissky, Daniel Harari, Shimon Ullman
The growing use of convolutional neural networks (CNN) for a broad range of visual tasks, including tasks involving fine details, raises the problem of applying such networks to a large field of view, since the amount of computations increases significantly with the number of pixels.
no code implementations • 25 Feb 2018 • Uri Patish, Shimon Ullman
We study the task of finding good local optima in combinatorial optimization problems.
no code implementations • 26 Dec 2017 • Guy Ben-Yosef, Alon Yachin, Shimon Ullman
Understanding social interactions (such as 'hug' or 'fight') is a basic and important capacity of the human visual system, but a challenging and still open problem for modeling.
no code implementations • 29 Nov 2017 • Guy Ben-Yosef, Liav Assif, Shimon Ullman
We model the process of human full interpretation of object images, namely the ability to identify and localize all semantic features and parts that are recognized by human observers.
no code implementations • 29 Nov 2016 • Daniel Harari, Tao Gao, Nancy Kanwisher, Joshua Tenenbaum, Shimon Ullman
How accurate are humans in determining the gaze direction of others in lifelike scenes, when they can move their heads and eyes freely, and what are the sources of information for the underlying perceptual processes?
no code implementations • 30 Oct 2016 • Shimon Ullman, Nimrod Dorfman, Daniel Harari
Current artificial learning systems can recognize thousands of visual categories, or play Go at a champion"s level, but cannot explain infants learning, in particular the ability to learn complex concepts without guidance, in a specific order.
no code implementations • 25 May 2016 • Amir Rosenfeld, Shimon Ullman
Classes in natural images tend to follow long tail distributions.
no code implementations • 27 Mar 2016 • Ita Lifshitz, Ethan Fetaya, Shimon Ullman
In this paper we consider the problem of human pose estimation from a single still image.
Ranked #37 on Pose Estimation on MPII Human Pose
no code implementations • EMNLP 2015 • Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, Shimon Ullman
Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception.
no code implementations • 14 Mar 2016 • Amir Rosenfeld, Shimon Ullman
Convolutional neural networks have been shown to develop internal representations, which correspond closely to semantically meaningful objects and parts, although trained solely on class labels.
no code implementations • 17 Jan 2016 • Amir Rosenfeld, Shimon Ullman
Action recognition in still images has seen major improvement in recent years due to advances in human pose estimation, object recognition and stronger feature representations.
no code implementations • 12 Nov 2015 • Amir Rosenfeld, Shimon Ullman
In this paper we demonstrate how recognition is improved by obtaining precise localization of the action-object and consequently extracting details of the object shape together with the actor-object interaction.
no code implementations • 4 Feb 2015 • Ethan Fetaya, Shimon Ullman
For many tasks and data types, there are natural transformations to which the data should be invariant or insensitive.
1 code implementation • 8 Dec 2014 • Tao Gao, Daniel Harari, Joshua Tenenbaum, Shimon Ullman
(1) Human accuracy of discriminating targets 8{\deg}-10{\deg} of visual angle apart is around 40% in a free looking gaze task; (2) The ability to interpret gaze of different lookers vary dramatically; (3) This variance can be captured by the computational model; (4) Human outperforms the current model significantly.
no code implementations • 10 Jun 2014 • Ethan Fetaya, Ohad Shamir, Shimon Ullman
We consider the problem of learning from a similarity matrix (such as spectral clustering and lowd imensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed.
no code implementations • NeurIPS 2010 • Leonid Karlinsky, Michael Dinerstein, Shimon Ullman
The task is easy for humans but difficult for current approaches to object recognition, because action instances may be similar in terms of body pose, and often require detailed examination of relations between participating objects and body parts in order to be recognized.