Search Results for author: Mélodie Boillet

Found 12 papers, 0 papers with code

Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library

no code implementations • 29 Apr 2024 • Solène Tarride, Yoann Schneider, Marie Generali-Lince, Mélodie Boillet, Bastien Abadie, Christopher Kermorvant

PyLaia is one of the most popular open-source software for Automatic Text Recognition (ATR), delivering strong performance in terms of speed and accuracy.

Language Modelling

Paper
Add Code

The Socface Project: Large-Scale Collection, Processing, and Analysis of a Century of French Censuses

no code implementations • 29 Apr 2024 • Mélodie Boillet, Solène Tarride, Yoann Schneider, Bastien Abadie, Lionel Kesztenbaum, Christopher Kermorvant

For this project, we developed a complete processing workflow: large-scale data collection from French departmental archives, collaborative annotation of documents, training of handwritten table text and structure recognition models, and mass processing of millions of images.

Table Recognition

Paper
Add Code

Handwritten Text Recognition from Crowdsourced Annotations

no code implementations • International Workshop on Historical Document Imaging and Processing 2023 • Solène Tarride, Tristan Faine, Mélodie Boillet, Harold Mouchère, Christopher Kermorvant

However, selecting training samples based on the degree of agreement between annotators introduces a bias in the training data and does not improve the results.

Ranked #1 on Handwritten Text Recognition on Belfort

Handwritten Text Recognition

Paper
Add Code

Large Scale Genealogical Information Extraction From Handwritten Quebec Parish Records

no code implementations • 27 Apr 2023 • Solène Tarride, Martin Maarand, Mélodie Boillet, James McGrath, Eugénie Capel, Hélène Vézina, Christopher Kermorvant

Verification of the birth and death acts from this sample shows that 74% of them are considered complete and valid.

Handwritten Text Recognition Line Detection +3

Paper
Add Code

Key-value information extraction from full handwritten pages

no code implementations • 26 Apr 2023 • Solène Tarride, Mélodie Boillet, Christopher Kermorvant

We propose a Transformer-based approach for information extraction from digitized handwritten documents.

Handwriting Recognition named-entity-recognition +1

Paper
Add Code

SIMARA: a database for key-value information extraction from full pages

no code implementations • 26 Apr 2023 • Solène Tarride, Mélodie Boillet, Jean-François Moufflet, Christopher Kermorvant

We propose a new database for information extraction from historical handwritten documents.

Ranked #1 on Key Information Extraction on SIMARA

Handwriting Recognition Handwritten Text Recognition +2

Paper
Add Code

Détection d'Objets dans les documents numérisés par réseaux de neurones profonds

no code implementations • 27 Jan 2023 • Mélodie Boillet

For this purpose, we propose confidence estimators from different approaches for object detection.

Document Layout Analysis Line Detection +3

Paper
Add Code

Confidence Estimation for Object Detection in Document Images

no code implementations • 29 Aug 2022 • Mélodie Boillet, Christopher Kermorvant, Thierry Paquet

In the active learning framework, the three first estimators show a significant improvement in performance for the detection of document physical pages and text lines compared to a random selection of images.

Active Learning Descriptive +3

Paper
Add Code

Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods

no code implementations • 23 Mar 2022 • Mélodie Boillet, Christopher Kermorvant, Thierry Paquet

We present a study conducted using three state-of-the-art systems Doc-UFCN, dhSegment and ARU-Net and show that it is possible to build generic models trained on a wide variety of historical document datasets that can correctly segment diverse unseen pages.

document understanding Line Detection +1

Paper
Add Code

Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers

no code implementations • 17 Sep 2021 • Mélodie Boillet, Martin Maarand, Thierry Paquet, Christopher Kermorvant

However, the segmentation of complex documents into semantic regions is sometimes impossible relying only on visual features and recent models embed both visual and textual information.

Position

Paper
Add Code

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

no code implementations • 28 Dec 2020 • Mélodie Boillet, Christopher Kermorvant, Thierry Paquet

In this paper, we introduce a fully convolutional network for the document layout analysis task.

Document Layout Analysis Line Detection

Paper
Add Code

HORAE: an annotated dataset of books of hours

no code implementations • 1 Dec 2020 • Mélodie Boillet, Marie-Laurence Bonhomme, Dominique Stutzmann, Christopher Kermorvant

We introduce in this paper a new dataset of annotated pages from books of hours, a type of handwritten prayer books owned and used by rich lay people in the late middle ages.

Line Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.