no code implementations • 22 Feb 2024 • Jiliang Li, Yifan Zhang, Zachary Karas, Collin McMillan, Kevin Leach, Yu Huang
Furthermore, alignment between model and human foci in this setting does not seem to dictate the quality of the LLM-generated summaries.
1 code implementation • 21 Feb 2024 • Yifan Zhang, Jiliang Li, Zachary Karas, Aakash Bansal, Toby Jia-Jun Li, Collin McMillan, Kevin Leach, Yu Huang
Neural code summarization leverages deep learning models to automatically generate brief natural language summaries of code snippets.
1 code implementation • 5 Sep 2023 • Aakash Bansal, Chia-Yi Su, Collin McMillan
Source code summarization is the task of writing natural language descriptions of source code.
1 code implementation • 28 Aug 2023 • Chia-Yi Su, Collin McMillan
A code summary is a brief natural language description of source code.
1 code implementation • 14 Aug 2023 • Chia-Yi Su, Collin McMillan
We also propose to combine our loss with traditional CCE for each word, which streamlines the training process compared to baselines.
1 code implementation • 21 Jul 2023 • Aakash Bansal, Siyuan Jiang, Sakib Haque, Collin McMillan
For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder.
no code implementations • 16 May 2023 • Aakash Bansal, Bonita Sharif, Collin McMillan
The attention mechanism learns to connect features in source code to specific words to use when generating natural language descriptions.
1 code implementation • 15 May 2023 • Chia-Yi Su, Aakash Bansal, Vijayanta Jain, Sepideh Ghanavati, Collin McMillan
In contrast to many existing language models, we prioritize features for researchers including an open and easily-searchable training set, a held out test set with different levels of deduplication from the training set, infrastructure for deduplicating new examples, and an implementation platform suitable for execution on equipment accessible to a relatively modest budget.
1 code implementation • 24 Jan 2022 • Zachary Eberhart, Collin McMillan
In source code search, a common information-seeking strategy involves providing a short initial query with a broad meaning, and then iteratively refining the query using terms gleaned from the results of subsequent searches.
1 code implementation • 22 Mar 2021 • Aakash Bansal, Sakib Haque, Collin McMillan
Source code summarization of a subroutine is the task of writing a short, natural language description of that subroutine.
no code implementations • 11 Jan 2021 • Aakash Bansal, Zachary Eberhart, Lingfei Wu, Collin McMillan
In this paper, we take initial steps to bringing state-of-the-art neural QA technologies to Software Engineering applications by designing a context-based QA system for basic questions about subroutines.
1 code implementation • 10 Apr 2020 • Sakib Haque, Alexander LeClair, Lingfei Wu, Collin McMillan
In this paper, we present an approach that models the file context of subroutines (i. e. other subroutines in the same file) and uses an attention mechanism to find words and concepts to use in summaries.
Software Engineering
2 code implementations • 6 Apr 2020 • Alexander LeClair, Sakib Haque, Lingfei Wu, Collin McMillan
The first approaches to use structural information flattened the AST into a sequence.
7 code implementations • NAACL 2019 • Alexander LeClair, Collin McMillan
The main use for these descriptions is in software documentation e. g. the one-sentence Java method descriptions in JavaDocs.
2 code implementations • 5 Feb 2019 • Alexander LeClair, Siyuan Jiang, Collin McMillan
In this paper, we present a neural model that combines words from code with code structure from an AST.
Software Engineering
no code implementations • 13 Jun 2018 • Andrew Wood, Paige Rodeghero, Ameer Armaly, Collin McMillan
This paper targets the problem of speech act detection in conversations about bug repair.
1 code implementation • 5 Jun 2018 • Alexander LeClair, Zachary Eberhart, Collin McMillan
Software Categorization is the task of organizing software into groups that broadly describe the behavior of the software, such as "editors" or "science."
1 code implementation • 30 Aug 2017 • Siyuan Jiang, Ameer Armaly, Collin McMillan
We trained an NMT algorithm using a corpus of diffs and human-written commit messages from the top 1k Github projects.