Search Results for author: Ryo Kamoi

Found 7 papers, 4 papers with code

Evaluating LLMs at Detecting Errors in LLM Responses

1 code implementation • 4 Apr 2024 • Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs.

Instruction Following

Paper
Code

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data

no code implementations • 16 Nov 2023 • Yilun Zhao, Yitao Long, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan

This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning and problem-solving capabilities of LLMs in the context of understanding and analyzing financial documents containing both text and tables.

Math

Paper
Add Code

Fair Abstractive Summarization of Diverse Perspectives

1 code implementation • 14 Nov 2023 • Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang

However, current work in summarization metrics and Large Language Models (LLMs) evaluation has not explored fair abstractive summarization.

Abstractive Text Summarization Fairness

Paper
Code

WiCE: Real-World Entailment for Claims in Wikipedia

1 code implementation • 2 Mar 2023 • Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett

Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation.

Fact Checking Natural Language Inference +3

Paper
Code

Shortcomings of Question Answering Based Factuality Frameworks for Error Localization

1 code implementation • 13 Oct 2022 • Ryo Kamoi, Tanya Goyal, Greg Durrett

Despite recent progress in abstractive summarization, models often generate summaries with factual errors.

Abstractive Text Summarization Question Answering +2

Paper
Code

Why is the Mahalanobis Distance Effective for Anomaly Detection?

no code implementations • 1 Mar 2020 • Ryo Kamoi, Kei Kobayashi

This suggests that the reason the Mahalanobis confidence score works so well is mistaken, and makes use of different information from ODIN, another popular OoD detection method based on prediction confidence.

Anomaly Detection General Classification +2

Paper
Add Code

Likelihood Assignment for Out-of-Distribution Inputs in Deep Generative Models is Sensitive to Prior Distribution Choice

no code implementations • 15 Nov 2019 • Ryo Kamoi, Kei Kobayashi

This paper focuses on the relationship between the choice of a prior distribution and the likelihoods assigned to out-of-distribution inputs.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.