Skip to main content

CREMMA HTR facility at Inria research centre

Location: École normale supérieure de Lyon

Date: 11th to 13th September 2024

Contact: Thibault Clérice ( thibault.clerice@inria.fr )

Visit the TranscribeoQuest event webpage for more information.

Inria is the French national research institute for digital sciences. It aims at yielding world-class research and technological innovation, together with the development and support of scientific and entrepreneurial projects. Inria focuses on several scientific themes including core digital technology, digital security, artificial intelligence and quantum computing. Since several years, Inria has had a leadership role in France in the domain of open science with a long-lasting deposit mandate associated with the French national repository HAL, where 86% of Inria’s scholarly production is currently recorded. Inria is also one of the founded institutions of the international Software Heritage initiative, which harvest, document and preserve all existing open-source software codes.

The ALMAnaCH research group at Inria research center in Paris carries out research in Artificial Intelligence bringing together Natural Language Processing and Computational Humanities. It hosts the CREMMA hardware and software facilities which is an instance of the eScriptorium/kraken software deployed over a strong CPU/GPU platform with large scale hard disk space and high speed internet connection.

Based upon its many years of research experience anchored in particular in close collaborations with French Cultural Heritage Institutions such as the Archives Nationales and the Bibliothèque Nationale de France, but also its contribution to the setting of the DARIAH research infrastructure, ALMAnaCH provides a unique scientific environment combining experts in large language models as well as digital modeling of human heritage objects.

Services currently offered by the infrastructure

The facility offers both the technical and the expertise basis for scholars to come and work on their own projects for creating training data, train segmentation and text recognition models and disseminate both their ground truth and recognition models openly. Scholars that will come to the ALMAnaCH settings will be accompanied in carrying out their research project on the basis of the CREMMA facility, but will also be linked to specialists in digital humanities project that will guide them on related issues such as data management, data modelling, licensing or online publishing. A typical project will be articulated around a global concept ranging from the management of primary documents to the publication of finalised digital editions resulting from an OCR/HTR phase. More precisely, the following topics will be dealt with when interacting with scholars coming to our facilities:

  • describing and organising primary document corpora using IIIF compatible servers;
  • organising and annotating training material in eScriptorium;
  • carrying our training campaigns and assessing them;
  • insights in possible formats for maintaining ground truth and automatically extracted content (ALTO XML, PAGE XML, TEI)
  • documenting and signalling ground truth with HTR United
  • publishing content with open sources platforms such as TEI Publisher

What this TNA offers

CREMMA are offering a summer school structured as a transcribathon, providing participants with hands-on experience in transcription. This event, provided by Biblissima+ with additional support from the ATRIUM project, offers participants the chance to engage with historical manuscripts while honing their transcription skills under the guidance of expert trainers. This scheme will provide five fully funded places for participants interested in developing their skills in transcription.

More information about the event can be found on the TranscribeoQuest event webpage .

INRIA logo in red