Search Results for author: Suhas Kotha

Found 6 papers, 3 papers with code

Jailbreaking is Best Solved by Definition

no code implementations • 20 Mar 2024 • Taeyoun Kim, Suhas Kotha, aditi raghunathan

The rise of "jailbreak" attacks on language models has led to a flurry of defenses aimed at preventing the output of undesirable responses.

Paper
Add Code

A Safe Harbor for AI Evaluation and Red Teaming

no code implementations • 7 Mar 2024 • Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Independent evaluation and red teaming are critical for identifying the risks posed by generative AI systems.

Paper
Add Code

Repetition Improves Language Model Embeddings

2 code implementations • 23 Feb 2024 • Jacob Mitchell Springer, Suhas Kotha, Daniel Fried, Graham Neubig, aditi raghunathan

In this work, we address an architectural limitation of autoregressive models: token embeddings cannot contain information from tokens that appear later in the input.

Language Modelling

525

Paper
Code

Understanding Catastrophic Forgetting in Language Models via Implicit Inference

1 code implementation • 18 Sep 2023 • Suhas Kotha, Jacob Mitchell Springer, aditi raghunathan

We lack a systematic understanding of the effects of fine-tuning (via methods such as instruction-tuning or reinforcement learning from human feedback), particularly on tasks outside the narrow fine-tuning distribution.

In-Context Learning

Paper
Code

Provably Bounding Neural Network Preimages

3 code implementations • NeurIPS 2023 • Suhas Kotha, Christopher Brix, Zico Kolter, Krishnamurthy Dvijotham, huan zhang

Most work on the formal verification of neural networks has focused on bounding the set of outputs that correspond to a given set of inputs (for example, bounded perturbations of a nominal input).

Adversarial Robustness

206

Paper
Code

CELESTIAL: Classification Enabled via Labelless Embeddings with Self-supervised Telescope Image Analysis Learning

no code implementations • 20 Jan 2022 • Suhas Kotha, Anirudh Koul, Siddha Ganju, Meher Kasam

To solve this problem, we establish CELESTIAL-a self-supervised learning pipeline for effectively leveraging sparsely-labeled satellite imagery.

Image Retrieval Retrieval +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.