Search Results for author: Adian Liusie

Found 19 papers, 7 papers with code

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

no code implementations • 22 May 2024 • Guangzhi Sun, Potsawee Manakul, Adian Liusie, Kunat Pipatanakul, Chao Zhang, Phil Woodland, Mark Gales

Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information.

Benchmarking Hallucination +2

Paper
Add Code

Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons

no code implementations • 9 May 2024 • Adian Liusie, Vatsal Raina, Yassir Fathullah, Mark Gales

When Gaussian experts are used one can derive simple closed-form solutions for the optimal candidate ranking, as well as expressions for selecting which comparisons should be made to maximize the probability of this ranking.

Paper
Add Code

WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models

no code implementations • 28 Mar 2024 • Piotr Molenda, Adian Liusie, Mark J. F. Gales

Watermarking generative-AI systems, such as LLMs, has gained considerable interest, driven by their enhanced capabilities across a wide range of tasks.

nlg evaluation

Paper
Add Code

Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models

no code implementations • 20 Mar 2024 • Adian Liusie, Yassir Fathullah, Mark J. F. Gales

Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities and versatility in NLP tasks, however they sometimes fail to maintain crucial invariances for specific tasks.

Paper
Add Code

Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment

no code implementations • 21 Feb 2024 • Vyas Raina, Adian Liusie, Mark Gales

Large Language Models (LLMs) are powerful zero-shot assessors and are increasingly used in real-world situations such as for written exams or benchmarking systems.

Adversarial Robustness Benchmarking

Paper
Add Code

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

no code implementations • 4 Jan 2024 • Xiaoding Lu, Zongyi Liu, Adian Liusie, Vyas Raina, Vineet Mudupalli, Yuwen Zhang, William Beauchamp

In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT.

Paper
Add Code

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

1 code implementation • 15 Nov 2023 • Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill

Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings.

Audio Classification Decoder +4

Paper
Code

Assessing Distractors in Multiple-Choice Tests

no code implementations • 8 Nov 2023 • Vatsal Raina, Adian Liusie, Mark Gales

Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options.

Multiple-choice Reading Comprehension

Paper
Add Code

Zero-shot Audio Topic Reranking using Large Language Models

no code implementations • 14 Sep 2023 • Mengjie Qian, Rao Ma, Adian Liusie, Erfan Loweimi, Kate M. Knill, Mark J. F. Gales

A key element for this process is highly rapid, flexible, search to support large archives, which in MVSE is facilitated by representing video attributes by embeddings.

Information Retrieval Retrieval

Paper
Add Code

Mitigating Word Bias in Zero-shot Prompt-based Classifiers

1 code implementation • 10 Sep 2023 • Adian Liusie, Potsawee Manakul, Mark J. F. Gales

To address this problem, it is possible to optimise classification thresholds on a labelled data set, however, this mitigates some of the advantages of prompt-based classifiers.

Zero-Shot Learning

Paper
Code

LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models

1 code implementation • 15 Jul 2023 • Adian Liusie, Potsawee Manakul, Mark J. F. Gales

Current developments in large language models (LLMs) have enabled impressive zero-shot capabilities across various natural language tasks.

nlg evaluation Response Generation +1

Paper
Code

Analyzing Multiple-Choice Reading and Listening Comprehension Tests

no code implementations • 3 Jul 2023 • Vatsal Raina, Adian Liusie, Mark Gales

Multiple-choice reading and listening comprehension tests are an important part of language assessment.

Multiple-choice Reading Comprehension +1

Paper
Add Code

Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution

no code implementations • 22 Jun 2023 • Adian Liusie, Vatsal Raina, Andrew Mullooly, Kate Knill, Mark J. F. Gales

Multiple choice exams are widely used to assess candidates across a diverse range of domains and tasks.

Multiple-choice

Paper
Add Code

CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models

1 code implementation • 8 Jun 2023 • Potsawee Manakul, Yassir Fathullah, Adian Liusie, Vyas Raina, Vatsal Raina, Mark Gales

In this paper, we consider the challenge of summarizing patients' medical progress notes in a limited data setting.

Paper
Code

Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

no code implementations • 9 May 2023 • Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales

In these scenarios, where for example knowing the quality of a system's output to predict poor performance prevails over knowing the output itself, is it possible to bypass the autoregressive decoding?

Attribute Automatic Speech Recognition +4

Paper
Add Code

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

3 code implementations • 15 Mar 2023 • Potsawee Manakul, Adian Liusie, Mark J. F. Gales

In this work, we propose "SelfCheckGPT", a simple sampling-based approach that can be used to fact-check the responses of black-box models in a zero-resource fashion, i. e. without an external database.

Fact Checking Hallucination +1

375

Paper
Code

Rewarding Chatbots for Real-World Engagement with Millions of Users

no code implementations • 10 Mar 2023 • Robert Irvine, Douglas Boubert, Vyas Raina, Adian Liusie, Ziyi Zhu, Vineet Mudupalli, Aliaksei Korshuk, Zongyi Liu, Fritz Cremer, Valentin Assassi, Christie-Carol Beauchamp, Xiaoding Lu, Thomas Rialan, William Beauchamp

The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time.

Chatbot Language Modelling

Paper
Add Code

MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization

2 code implementations • 28 Jan 2023 • Potsawee Manakul, Adian Liusie, Mark J. F. Gales

In this work, we introduce an alternative scheme based on standard information-theoretic measures in which the information present in the source and summary is directly compared.

Hallucination Multiple-choice +1

375

Paper
Code

World Knowledge in Multiple Choice Reading Comprehension

1 code implementation • 13 Nov 2022 • Adian Liusie, Vatsal Raina, Mark Gales

Two metrics are described: the expected number of options, which measures whether a passage-free system can identify the answer a question using world knowledge; and the contextual mutual information, which measures the importance of context for a given question.

General Knowledge Multiple-choice +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.