no code implementations • 7 Feb 2024 • Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju
We highlight that the current trend towards increasing the plausibility of explanations, primarily driven by the demand for user-friendly interfaces, may come at the cost of diminishing their faithfulness.
1 code implementation • 6 Nov 2023 • Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju
In this work, we make one of the first attempts at quantifying the uncertainty in explanations of LLMs.
no code implementations • 3 Jun 2023 • Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju
We introduce a method for computing scores for each word in the prompt; these scores represent its influence on biases in the model's output.