1 code implementation • EMNLP (Eval4NLP) 2020 • Jesper Brink Andersen, Mikkel Bak Bertelsen, Mikkel Hørby Schou, Manuel R. Ciosici, Ira Assent
The data set is expanded to contain semantic and syntactic tests and is multilingual (English, German, and Italian).
no code implementations • 30 Oct 2023 • Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel
In work contemporaneous with ours, Lin et al. (2023) demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large) complemented by OpenAI's massive LLMs to achieve outstanding results in ScienceWorld.
no code implementations • 31 Aug 2022 • Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows.
no code implementations • 25 Aug 2022 • Manuel R. Ciosici, Leon Derczynski
Training large neural language models on large datasets is resource- and time-intensive.
no code implementations • 4 Oct 2021 • Manuel R. Ciosici, Joe Cecil, Alex Hedges, Dong-Ho Lee, Marjorie Freedman, Ralph Weischedel
Our goal is to deliver a new task and leaderboard to stimulate research on question answering and pre-trained language models (PTLMs) to understand a significant instructional document, e. g., an introductory college textbook or a manual.
1 code implementation • EACL 2021 • Mads Toftrup, Søren Asger Sørensen, Manuel R. Ciosici, Ira Assent
Language Identification is the task of identifying a document's language.
Ranked #1 on Language Identification on OpenSubtitles
1 code implementation • NAACL 2021 • Manuel R. Ciosici, Joseph Cummings, Mitchell DeHaven, Alex Hedges, Yash Kankanampati, Dong-Ho Lee, Ralph Weischedel, Marjorie Freedman
We describe Machine-Aided Script Curator (MASC), a system for human-machine collaborative script authoring.
no code implementations • 7 May 2020 • Leon Strømberg-Derczynski, Manuel R. Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, Daniel Varab
Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers.
1 code implementation • LREC 2020 • Manuel R. Ciosici, Ira Assent, Leon Derczynski
We present efficient implementations of Brown clustering and the alternative Exchange clustering as well as a number of methods to accelerate the computation of both hierarchical and flat clusters.
no code implementations • WS 2019 • William Baumgartner, Michael Bada, Sampo Pyysalo, Manuel R. Ciosici, Negacy Hailu, Harrison Pielke-Lombardo, Michael Regan, Lawrence Hunter
As part of the BioNLP Open Shared Tasks 2019, the CRAFT Shared Tasks 2019 provides a platform to gauge the state of the art for three fundamental language processing tasks {---} dependency parse construction, coreference resolution, and ontology concept identification {---} over full-text biomedical articles.
no code implementations • NAACL 2019 • Manuel R. Ciosici, Leon Derczynski, Ira Assent
We show that increases in Average Mutual Information, the clustering algorithms{'} optimization goal, are highly correlated with improvements in encoding of morphosyntactic information.
no code implementations • NAACL 2019 • Manuel R. Ciosici, Ira Assent
We present Abbreviation Explorer, a system that supports interactive exploration of abbreviations that are challenging for Unsupervised Abbreviation Disambiguation (UAD).
no code implementations • COLING 2018 • Manuel R. Ciosici, Ira Assent
Abbreviations and acronyms are a part of textual communication in most domains.
1 code implementation • 3 Aug 2016 • Manuel R. Ciosici
Because of its ability to produce high-quality, human-understandable cluster, Brown clustering has seen high uptake the NLP research community where it is used in the preprocessing and feature generation steps.