1 code implementation • 23 Sep 2021 • Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, Antoine Doucet
After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts.
no code implementations • LREC 2020 • Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Benjamin Str{\"o}bel, Rapha{\"e}l Barman
If this represents a huge step forward in terms of preservation and accessibility, the next fundamental challenge{--} and real promise of digitization{--} is to exploit the contents of these digital assets, and therefore to adapt and develop appropriate language technologies to search and retrieve information from this {`}Big Data of the Past{'}.
3 code implementations • 14 Feb 2020 • Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan
The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.
no code implementations • LREC 2016 • Guillaume Jacquet, Maud Ehrmann, Ralf Steinberger, Jaakko V{\"a}yrynen
This paper reports on an approach and experiments to automatically build a cross-lingual multi-word entity resource.
no code implementations • LREC 2016 • Maud Ehrmann, Damien Nouvel, Sophie Rosset
Recognition of real-world entities is crucial for most NLP applications.
no code implementations • LREC 2014 • J{\'u}lia Pajzs, Ralf Steinberger, Maud Ehrmann, Mohamed Ebrahim, Leonida della Rocca, Stefano Bucci, Eszter Simon, Tam{\'a}s V{\'a}radi
In this paper, we describe the effort of adding to EMM Hungarian text mining tools for news gathering; document categorisation; named entity recognition and classification for persons, organisations and locations; name lemmatisation; quotation recognition; and cross-lingual linking of related news clusters.
no code implementations • LREC 2014 • Guillaume Jacquet, Maud Ehrmann, Ralf Steinberger
Multi-word entities, such as organisation names, are frequently written in many different ways.
no code implementations • LREC 2014 • Maud Ehrmann, Francesco Cecconi, Daniele Vannella, John Philip McCrae, Philipp Cimiano, Roberto Navigli
Recent years have witnessed a surge in the amount of semantic information published on the Web.
no code implementations • RANLP 2013 • Maud Ehrmann, Leonida della Rocca, Ralf Steinberger, Hristo Tanev
We are presenting work on recognising acronyms of the form Long-Form (Short-Form) such as "International Monetary Fund (IMF)" in millions of news articles in twenty-two languages, as part of our more general effort to recognise entities and their variants in news text and to use them for the automatic analysis of the news, including the linking of related news across languages.