no code implementations • ICML 2020 • Himabindu Lakkaraju, Nino Arsov, Osbert Bastani
As machine learning black boxes are increasingly being deployed in real-world applications, there has been a growing interest in developing post hoc explanations that summarize the behaviors of these black box models.
no code implementations • 8 May 2024 • Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, Sarath Chandar
At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained.
1 code implementation • 29 Apr 2024 • Aaron J. Li, Satyapriya Krishna, Himabindu Lakkaraju
The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power.
2 code implementations • 11 Apr 2024 • Aounon Kumar, Himabindu Lakkaraju
We demonstrate that adding a strategic text sequence (STS) -- a carefully crafted message -- to a product's information page can significantly increase its likelihood of being listed as the LLM's top recommendation.
no code implementations • 6 Apr 2024 • Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju
Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive.
no code implementations • 6 Mar 2024 • Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju
As large language models (LLMs) develop ever-improving capabilities and are applied in real-world settings, it is important to understand their safety.
no code implementations • 27 Feb 2024 • Zhenting Qi, HANLIN ZHANG, Eric Xing, Sham Kakade, Himabindu Lakkaraju
Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation.
no code implementations • 20 Feb 2024 • Jiaqi Ma, Vivian Lai, Yiming Zhang, Chacha Chen, Paul Hamilton, Davor Ljubenkov, Himabindu Lakkaraju, Chenhao Tan
However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies is complex; numerous design choices in the design space of user study lead to problems of reproducibility; and running user studies can be challenging and even daunting for machine learning researchers.
no code implementations • 16 Feb 2024 • Haiyan Zhao, Fan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du
Large language models (LLMs) have led to breakthroughs in language tasks, yet the internal mechanisms that enable their remarkable generalization and reasoning abilities remain opaque.
1 code implementation • 16 Feb 2024 • Usha Bhalla, Alex Oesterling, Suraj Srinivas, Flavio P. Calmon, Himabindu Lakkaraju
CLIP embeddings have demonstrated remarkable performance across a wide range of computer vision tasks.
no code implementations • 9 Feb 2024 • Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities.
no code implementations • 7 Feb 2024 • Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju
We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness.
no code implementations • 7 Dec 2023 • HANLIN ZHANG, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade
Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs).
1 code implementation • 6 Nov 2023 • Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju
In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs.
no code implementations • 23 Oct 2023 • Yanchen Liu, Srishti Gautam, Jiaqi Ma, Himabindu Lakkaraju
Recent literature has suggested the potential of using large language models (LLMs) to make classifications for tabular tasks.
1 code implementation • 11 Oct 2023 • Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju
In this work, we propose a new class of unlearning methods for LLMs we call ''In-Context Unlearning'', providing inputs in context and without having to update model parameters.
1 code implementation • 9 Oct 2023 • Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
To this end, several approaches have been proposed in recent literature to explain the behavior of complex predictive models in a post hoc fashion.
Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +1
no code implementations • 28 Sep 2023 • Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.
1 code implementation • 6 Sep 2023 • Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju
We defend against three attack modes: i) adversarial suffix, where an adversarial sequence is appended at the end of a harmful prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block.
no code implementations • 8 Aug 2023 • Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju
We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR).
1 code implementation • NeurIPS 2023 • Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju
This strategy naturally combines the ease of use of post hoc explanations with the faithfulness of inherently interpretable models.
no code implementations • 26 Jul 2023 • Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
These estimators linearize models in the local region around an input and analytically compute the robustness of the resulting linear models.
no code implementations • 25 Jul 2023 • Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju
Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks.
no code implementations • 11 Jun 2023 • Anna P. Meyer, Dan Ley, Suraj Srinivas, Himabindu Lakkaraju
To this end, we conduct rigorous theoretical analysis to demonstrate that model curvature, weight decay parameters while training, and the magnitude of the dataset shift are key factors that determine the extent of explanation (in)stability.
no code implementations • 9 Jun 2023 • Dan Ley, Leonard Tang, Matthew Nazari, Hongjin Lin, Suraj Srinivas, Himabindu Lakkaraju
This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy, which arises due to the existence of multiple (nearly) equally well-performing models for a given dataset and task.
no code implementations • 3 Jun 2023 • Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju
We introduce a method for computing scores for each word in the prompt; these scores represent its influence on biases in the model's output.
no code implementations • NeurIPS 2023 • Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju
Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks.
no code implementations • 8 Feb 2023 • Satyapriya Krishna, Jiaqi Ma, Himabindu Lakkaraju
The Right to Explanation and the Right to be Forgotten are two important principles outlined to regulate algorithmic decision making and data usage in real-world applications.
1 code implementation • 10 Nov 2022 • Martin Pawelczyk, Himabindu Lakkaraju, Seth Neel
As predictive models are increasingly being employed to make consequential decisions, there is a growing emphasis on developing techniques that can provide algorithmic recourse to affected individuals.
no code implementations • 18 Sep 2022 • Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju
When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes.
1 code implementation • 19 Aug 2022 • Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik
As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations.
1 code implementation • 8 Jul 2022 • Dylan Slack, Satyapriya Krishna, Himabindu Lakkaraju, Sameer Singh
In real-world evaluations with humans, 73% of healthcare workers (e. g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems for explainability in a disease prediction task, and 85% of ML professionals agreed TalkToModel was easier to use for computing explanations.
2 code implementations • 22 Jun 2022 • Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju
OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets.
2 code implementations • 14 Jun 2022 • Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, Francois Fleuret
To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers.
no code implementations • 6 Jun 2022 • Murtuza N Shergadwala, Himabindu Lakkaraju, Krishnaram Kenthapadi
Predictive models are increasingly used to make various consequential decisions in high-stakes domains such as healthcare, finance, and policy.
1 code implementation • 2 Jun 2022 • Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.
no code implementations • 15 May 2022 • Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju
We then leverage these properties to propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods.
no code implementations • 14 Mar 2022 • Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju
As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.
3 code implementations • 13 Mar 2022 • Martin Pawelczyk, Teresa Datta, Johannes van-den-Heuvel, Gjergji Kasneci, Himabindu Lakkaraju
To this end, we propose a novel objective function which simultaneously minimizes the gap between the achieved (resulting) and desired recourse invalidation rates, minimizes recourse costs, and also ensures that the resulting recourse achieves a positive model prediction.
1 code implementation • 3 Feb 2022 • Himabindu Lakkaraju, Dylan Slack, Yuxin Chen, Chenhao Tan, Sameer Singh
Overall, we hope our work serves as a starting place for researchers and engineers to design interactive explainability systems.
no code implementations • 3 Feb 2022 • Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju
To this end, we first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding.
no code implementations • 24 Jun 2021 • Jessica Dai, Sohini Upadhyay, Stephen H. Bach, Himabindu Lakkaraju
In situations where explanations of black-box models may be useful, the fairness of the black-box is also often a relevant concern.
no code implementations • 23 Jun 2021 • Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju
As machine learning models are increasingly used in critical decision-making settings (e. g., healthcare, finance), there has been a growing emphasis on developing methods to explain model predictions.
no code implementations • 18 Jun 2021 • Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju
As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice.
no code implementations • 16 Jun 2021 • Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju
As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models.
no code implementations • NeurIPS 2021 • Dylan Slack, Sophie Hilgard, Himabindu Lakkaraju, Sameer Singh
In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated.
no code implementations • 29 Mar 2021 • Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju
Most of the existing work focuses on optimizing for either adversarial shifts or interventional shifts.
no code implementations • NeurIPS 2021 • Sohini Upadhyay, Shalmali Joshi, Himabindu Lakkaraju
To address this problem, we propose a novel framework, RObust Algorithmic Recourse (ROAR), that leverages adversarial training for finding recourses that are robust to model shifts.
3 code implementations • 25 Feb 2021 • Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik
In this work, we establish a key connection between counterfactual fairness and stability and leverage it to propose a novel framework, NIFTY (uNIfying Fairness and stabiliTY), which can be used with any GNN to learn fair and stable representations.
no code implementations • 21 Feb 2021 • Sushant Agarwal, Shahin Jabbari, Chirag Agarwal, Sohini Upadhyay, Zhiwei Steven Wu, Himabindu Lakkaraju
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a post hoc manner.
no code implementations • 22 Dec 2020 • Kaivalya Rawal, Ece Kamar, Himabindu Lakkaraju
Our theoretical results establish a lower bound on the probability of recourse invalidation due to model shifts, and show the existence of a tradeoff between this invalidation probability and typical notions of "cost" minimized by modern recourse generation algorithms.
no code implementations • 1 Dec 2020 • Tom Sühr, Sophie Hilgard, Himabindu Lakkaraju
In this work, we analyze various sources of gender biases in online hiring platforms, including the job context and inherent biases of employers and establish how these factors interact with ranking algorithms to affect hiring decisions.
no code implementations • 12 Nov 2020 • Sean McGrath, Parth Mehta, Alexandra Zytek, Isaac Lage, Himabindu Lakkaraju
As machine learning (ML) models are increasingly being employed to assist human decision makers, it becomes critical to provide these decision makers with relevant inputs which can help them decide if and how to incorporate model predictions into their decision making.
1 code implementation • NeurIPS 2021 • Alexis Ross, Himabindu Lakkaraju, Osbert Bastani
As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations.
no code implementations • 12 Nov 2020 • Himabindu Lakkaraju, Nino Arsov, Osbert Bastani
To the best of our knowledge, this work makes the first attempt at generating post hoc explanations that are robust to a general class of adversarial perturbations that are of practical interest.
1 code implementation • NeurIPS 2020 • Wanqian Yang, Lars Lorch, Moritz A. Graule, Himabindu Lakkaraju, Finale Doshi-Velez
Domains where supervised models are deployed often come with task-specific constraints, such as prior expert knowledge on the ground-truth function, or desiderata like safety and fairness.
1 code implementation • NeurIPS 2020 • Kaivalya Rawal, Himabindu Lakkaraju
As predictive models are increasingly being deployed in high-stakes decision-making, there has been a lot of interest in developing algorithms which can provide recourses to affected individuals.
1 code implementation • NeurIPS 2021 • Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju
In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty.
no code implementations • 14 Jun 2020 • Aida Rahmattalabi, Shahin Jabbari, Himabindu Lakkaraju, Phebe Vayanos, Max Izenberg, Ryan Brown, Eric Rice, Milind Tambe
Under this framework, the trade-off between fairness and efficiency can be controlled by a single inequality aversion design parameter.
no code implementations • 15 Nov 2019 • Himabindu Lakkaraju, Osbert Bastani
Our work is the first to empirically establish how user trust in black box models can be manipulated via misleading explanations.
2 code implementations • 6 Nov 2019 • Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju
Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous.
no code implementations • 4 Jul 2017 • Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec
To the best of our knowledge, this is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences.
no code implementations • NeurIPS 2016 • Himabindu Lakkaraju, Jure Leskovec
We propose Confusions over Time (CoT), a novel generative framework which facilitates a multi-granular analysis of the decision making process.
no code implementations • 23 Nov 2016 • Himabindu Lakkaraju, Cynthia Rudin
We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments.
no code implementations • 28 Oct 2016 • Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz
Predictive models deployed in the real world may assign incorrect labels to instances with high confidence.
no code implementations • 21 Oct 2016 • Himabindu Lakkaraju, Cynthia Rudin
We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments.