Search Results for author: Toufique Ahmed

Found 17 papers, 4 papers with code

Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

no code implementations • 5 May 2024 • Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Premkumar Devanbu, Mohammad Amin Alipour

Large language models (LLMs) have provided a lot of exciting new capabilities in software development.

Paper
Add Code

Enhancing Trust in LLM-Generated Code Summaries with Calibrated Confidence Scores

no code implementations • 30 Apr 2024 • Yuvraj Virk, Premkumar Devanbu, Toufique Ahmed

There has been a considerable body of research into automated AI-based methods, using Large Language models (LLMs), to generate summaries of code; there also has been quite a bit work on ways to measure the performance of such summarization methods, with special attention paid to how closely these AI-generated summaries resemble a summary a human might have produced.

Paper
Add Code

Studying LLM Performance on Closed- and Open-source Data

no code implementations • 23 Feb 2024 • Toufique Ahmed, Christian Bird, Premkumar Devanbu, Saikat Chakraborty

We find that performance for C# changes little from OSS --> proprietary code, but does significantly reduce for C++; we find that this difference is attributable to differences in identifiers.

In-Context Learning

Paper
Add Code

Calibration and Correctness of Language Models for Code

no code implementations • 3 Feb 2024 • Claudio Spiess, David Gros, Kunal Suresh Pai, Michael Pradel, Md Rafiqul Islam Rabin, Amin Alipour, Susmit Jha, Prem Devanbu, Toufique Ahmed

Our contributions will lead to better-calibrated decision-making in the current use of code generated by language models, and offers a framework for future research to further improve calibration methods for generative models in Software Engineering.

Paper
Add Code

Towards Understanding What Code Language Models Learned

no code implementations • 20 Jun 2023 • Toufique Ahmed, Dian Yu, Chengxuan Huang, Cathy Wang, Prem Devanbu, Kenji Sagae

To understand the extent to which language models can learn some form of meaning, we investigate their ability to capture semantics of code beyond superficial frequency and co-occurrence.

Paper
Add Code

Better patching using LLM prompting, via Self-Consistency

no code implementations • 31 May 2023 • Toufique Ahmed, Premkumar Devanbu

Large Language models (LLMs) can be induced to solve non-trivial problems with "few-shot" prompts including illustrative problem-solution examples.

Program Repair

Paper
Add Code

Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)

no code implementations • 13 Apr 2023 • Toufique Ahmed, Kunal Suresh Pai, Premkumar Devanbu, Earl T. Barr

This approach improves performance in several different settings suggested by prior work, including for two different Large Language Models.

Code Summarization Information Retrieval +3

Paper
Add Code

Large Language Models and Simple, Stupid Bugs

no code implementations • 20 Mar 2023 • Kevin Jesse, Toufique Ahmed, Premkumar T. Devanbu, Emily Morgan

We explore the consequences of the Codex generated SStuBs and propose avoidance strategies that suggest the possibility of reducing the production of known, verbatim SStubs, and increase the possibility of producing known, verbatim fixes.

Language Modelling Large Language Model

Paper
Add Code

Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models

no code implementations • 10 Jan 2023 • Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann, Xuchao Zhang, Saravan Rajmohan

In this work, we do the first large-scale study to evaluate the effectiveness of these models for helping engineers root cause and mitigate production incidents.

Management Question Answering +1

Paper
Add Code

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

1 code implementation • 4 Jan 2023 • Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Premkumar Devanbu, Arie van Deursen

While the automated summarisation of decompiled code can help Reverse Engineers understand and analyse binaries, current work mainly focuses on summarising source code, and no suitable dataset exists for this task.

Paper
Code

Few-shot training LLMs for project-specific code-summarization

no code implementations • 9 Jul 2022 • Toufique Ahmed, Premkumar Devanbu

Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code.

Code Summarization Few-Shot Learning +1

Paper
Add Code

NatGen: Generative pre-training by "Naturalizing" source code

1 code implementation • 15 Jun 2022 • Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, Baishakhi Ray

Pre-trained Generative Language models (e. g. PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation.

Code Translation Few-Shot Learning +1

Paper
Code

Learning code summarization from a small and local dataset

no code implementations • 2 Jun 2022 • Toufique Ahmed, Premkumar Devanbu

We compare several models and training approaches, including same-project training, cross-project training, training a model especially designed to be sample efficient (and thus prima facie well-suited for learning in a limited-sample same-project setting) and a maximalist hybrid approach, fine-tuning first on many projects in many languages and then training on the same-project.

Code Summarization Time Series +1

Paper
Add Code

Multilingual training for Software Engineering

no code implementations • 3 Dec 2021 • Toufique Ahmed, Premkumar Devanbu

As a way around such data bottlenecks, we present evidence suggesting that human-written code in different languages (which performs the same function), is rather similar, and particularly preserving of identifier naming patterns; we further present evidence suggesting that identifiers are a very important element of training data for software engineering tasks.

Ranked #5 on Type prediction on ManyTypes4TypeScript

Code Summarization Retrieval +1

Paper
Add Code

Learning Type Annotation: Is Big Data Enough?

1 code implementation • ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2021 • Kevin Jesse, Premkumar T. Devanbu, Toufique Ahmed

ML approaches use different inductive biases, ranging from simple token sequences to complex graphical neural network (GNN) models capturing syntax and semantic relations.

Inductive Bias

Paper
Code

SYNFIX: Automatically Fixing Syntax Errors using Compiler Diagnostics

no code implementations • 29 Apr 2021 • Toufique Ahmed, Noah Rose Ledesma, Premkumar Devanbu

Beginning programmers struggle with the complex grammar of modern programming languages like Java, and make lot of syntax errors.

BIG-bench Machine Learning Multi-Label Classification +1

Paper
Add Code

Early Prediction for Merged vs Abandoned Code Changes in Modern Code Reviews

1 code implementation • 7 Dec 2019 • Md. Khairul Islam, Toufique Ahmed, Rifat Shahriyar, Anindya Iqbal, Gias Uddin

In our empirical study on the 146, 612 code changes from the three software projects, we find that (1) The new features like reviewer dimensions that are introduced in PredCR are the most informative.

Management

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.