Search Results for author: Asma Ghandeharioun

Found 13 papers, 8 papers with code

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

1 code implementation • 11 Jan 2024 • Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva

We introduce a framework called Patchscopes and show how it can be used to answer a wide range of questions about an LLM's computation.

112

Paper
Code

Interpretability Illusions in the Generalization of Simplified Models

no code implementations • 6 Dec 2023 • Dan Friedman, Andrew Lampinen, Lucas Dixon, Danqi Chen, Asma Ghandeharioun

A common method to study deep learning systems is to use simplified model representations -- for example, using singular value decomposition to visualize the model's hidden states in a lower dimensional space.

Code Completion Dimensionality Reduction +1

Paper
Add Code

Post Hoc Explanations of Language Models Can Improve Language Models

no code implementations • NeurIPS 2023 • Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks.

In-Context Learning

Paper
Add Code

Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity

no code implementations • 24 Jan 2023 • Robert A. Lewis, Asma Ghandeharioun, Szymon Fedor, Paola Pedrelli, Rosalind Picard, David Mischoulon

We suggest that this improved performance results from the ability of the mixed effects random forest to personalise model parameters to individuals in the dataset.

Paper
Add Code

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

1 code implementation • NeurIPS 2023 • Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun

This finding raises questions about how past work relies on Causal Tracing to select which model layers to edit.

Denoising knowledge editing

Paper
Code

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

1 code implementation • ICLR 2022 • Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind W. Picard

Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars.

counterfactual Fairness +2

Paper
Code

Human-centric Dialog Training via Offline Reinforcement Learning

1 code implementation • EMNLP 2020 • Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, Rosalind Picard

We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL).

Language Modelling Offline RL +2

176

Paper
Code

Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog

no code implementations • ICLR 2020 • Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

This is a critical shortcoming for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment -- e. g. systems that learn from human interaction.

OpenAI Gym Open-Domain Dialog +3

Paper
Add Code

Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

1 code implementation • 20 Sep 2019 • Asma Ghandeharioun, Brian Eoff, Brendan Jou, Rosalind W. Picard

Supporting model interpretability for complex phenomena where annotators can legitimately disagree, such as emotion recognition, is a challenging machine learning task.

Emotion Recognition

Paper
Code

Hierarchical Reinforcement Learning for Open-Domain Dialog

1 code implementation • 17 Sep 2019 • Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Rosalind Picard

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text.

Hierarchical Reinforcement Learning Open-Domain Dialog +2

176

Paper
Code

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

1 code implementation • 30 Jun 2019 • Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment.

Open-Domain Dialog Q-Learning +2

176

Paper
Code

Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

2 code implementations • NeurIPS 2019 • Asma Ghandeharioun, Judy Hanwen Shen, Natasha Jaques, Craig Ferguson, Noah Jones, Agata Lapedriza, Rosalind Picard

To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level.

Dialogue Evaluation Knowledge Distillation

176

Paper
Code

Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

no code implementations • 17 Apr 2017 • Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi, Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams

Professional-grade software applications are powerful but complicated$-$expert users can achieve impressive results, but novices often struggle to complete even basic tasks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.