Search Results for author: Maliheh Izadi

Found 18 papers, 13 papers with code

A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

no code implementations • 23 May 2024 • Aral de Moor, Arie van Deursen, Maliheh Izadi

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions.

Code Completion

Paper
Add Code

Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study

no code implementations • 22 Mar 2024 • Tim van Dam, Frank van der Heijden, Philippe de Bekker, Berend Nieuwschepen, Marc Otten, Maliheh Izadi

However, research on code completion models typically focuses on imperative languages such as Python and JavaScript, which results in a lack of representation for functional programming languages.

Code Completion Language Modelling

Paper
Add Code

An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets

1 code implementation • 22 Mar 2024 • Jonathan Katzy, Răzvan-Mihai Popescu, Arie van Deursen, Maliheh Izadi

Based on the findings of our study, which highlights the pervasive issue of license inconsistencies in large language models trained on code, our recommendation for both researchers and the community is to prioritize the development and adoption of best practices for dataset creation and management.

Language Modelling Large Language Model

Paper
Code

Language Models for Code Completion: A Practical Evaluation

1 code implementation • 25 Feb 2024 • Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, Arie van Deursen

InCoder outperformed the other models across all programming languages, highlighting the significance of training data and objectives.

Code Completion valid

Paper
Code

Traces of Memorisation in Large Language Models for Code

1 code implementation • 18 Dec 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

We find that large language models for code are vulnerable to data extraction attacks, like their natural language counterparts.

Code Completion

Paper
Code

On the Impact of Language Selection for Training and Evaluating Programming Language Models

no code implementations • 25 Aug 2023 • Jonathan Katzy, Maliheh Izadi, Arie van Deursen

The recent advancements in Transformer-based Language Models have demonstrated significant potential in enhancing the multilingual capabilities of these models.

Paper
Add Code

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

1 code implementation • 24 Apr 2023 • Tim van Dam, Maliheh Izadi, Arie van Deursen

For comments, we find that the models perform better in the presence of multi-line comments (again with small effect sizes).

Code Completion

Paper
Code

The (ab)use of Open Source Code to Train Large Language Models

1 code implementation • 27 Feb 2023 • Ali Al-Kaswan, Maliheh Izadi

In recent years, Large Language Models (LLMs) have gained significant popularity due to their ability to generate human-like text and their potential applications in various fields, such as Software Engineering.

Memorization

Paper
Code

STACC: Code Comment Classification using SentenceTransformers

1 code implementation • 25 Feb 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

Code comments are a key resource for information about software artefacts.

Classification

Paper
Code

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge

no code implementations • 13 Feb 2023 • Ali Al-Kaswan, Maliheh Izadi, Arie van Deursen

In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge.

Inference Attack Language Modelling +2

Paper
Add Code

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

1 code implementation • 4 Jan 2023 • Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Premkumar Devanbu, Arie van Deursen

While the automated summarisation of decompiled code can help Reverse Engineers understand and analyse binaries, current work mainly focuses on summarising source code, and no suitable dataset exists for this task.

Paper
Code

An Empirical Study on Data Leakage and Generalizability of Link Prediction Models for Issues and Commits

no code implementations • 1 Nov 2022 • Maliheh Izadi, Pooya Rostami Mazrae, Tom Mens, Arie van Deursen

However, these approaches primarily focused on improving prediction accuracy on randomly-split datasets, with limited attention given to the impact of data leakage and the generalizability of the predictive models.

Link Prediction Transfer Learning

Paper
Add Code

Semantically-enhanced Topic Recommendation System for Software Projects

1 code implementation • 31 May 2022 • Maliheh Izadi, Mahtab Nejati, Abbas Heydarnoori

Then, (2) we build two recommender systems; The first one operates only based on the list of original topics assigned to a repository and the relationships specified in our knowledge graph.

Recommendation Systems

Paper
Code

On the Evaluation of NLP-based Models for Software Engineering

1 code implementation • 31 Mar 2022 • Maliheh Izadi, Matin Nili Ahmadabadi

NLP-based models have been increasingly incorporated to address SE problems.

Paper
Code

CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers

1 code implementation • 31 Mar 2022 • Maliheh Izadi

Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset.

Management

Paper
Code

CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences

1 code implementation • 14 Feb 2022 • Maliheh Izadi, Roberta Gismondi, Georgios Gousios

Both approaches have significant drawbacks: grammar-based autocompletion is restricted in dynamically-typed language environments, whereas NLP-based autocompleters struggle to understand the semantics of the programming language and the developer's code context.

Code Completion Language Modelling +1

Paper
Code

Automated Recovery of Issue-Commit Links Leveraging Both Textual and Non-textual Data

1 code implementation • 5 Jul 2021 • Pooya Rostami Mazrae, Maliheh Izadi, Abbas Heydarnoori

The low performance gets even more severe when there is a lack of textual information in either commits or issues.

Paper
Code

Improving Quality of a Post's Set of Answers in Stack Overflow

1 code implementation • 30 May 2020 • Mohammadrezar Tavakoli, Maliheh Izadi, Abbas Heydarnoori

Then, we developed an Eclipse plugin named SOPI and integrated the prediction model in the plugin to link these deficient posts to related developers and help them improve the answer set.

Community Question Answering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.