Skip to main content

Workflows

Workflows are often overlooked in the arts and humanities but are essential for research. Workflows describe specific research scenarios to provide researchers with a clear step-by-step guide to help them integrate different tools and methods to achieve their goals.

ATRIUM develops and shares workflows that help researchers integrate digital methods into their work. These guides outline how to navigate the research data lifecycle from start to finish.

ATRIUM workflows are hosted and built using templates from the SSH Open Marketplace . Browse the ATRIUM Workflow catalogue below:

17 workflows

  • 3D Digital Representations of Heritage Assets

    The developed 3D workflow aims to provide a comprehensive, step-by-step methodology for the digitisation of built heritage, guiding projects from initial planning to the final publication of high-quality, interoperable 3D data models. Its primary goal is to ensure accurate, consistent, and context-rich digital documentation of monuments, buildings, and historical sites for purposes such as conservation, research, education, virtual exhibitions, and heritage management.

    In current practice, many digital documentation initiatives suffer from fragmentation - researchers often struggle to bring together technical precision with the historical and architectural depth that built heritage demands. As a result, 3D models are often disconnected from their interpretative context. This workflow guides practitioners through a structured sequence that integrates archival research, field assessment, advanced survey techniques, and post-processing into a coherent and reusable approach.

    The workflow encompasses nine structured stages, from defining objectives and metadata to on-site inspection, digital survey, data processing, 3D modelling, validation, and final dissemination. Each phase is designed to build upon the previous one, ensuring a seamless transition from raw data to usable, shareable, and long-term preservable digital assets.

    The application of this workflow is particularly suited to multidisciplinary heritage projects where diverse stakeholders—architects, conservators, historians, and digital technicians—collaborate. It supports the production of interoperable models, that comply with open standards and can be integrated into systems like GIS, BIM, or online repositories, enhancing accessibility, transparency, and reuse.

    3D Digital Representations of Heritage Assets

    More info
  • 3D-HBIM workflow

    The 3D-Heritage Building Information Modelling (HBIM) workflow provides a comprehensive approach for the 3D geometric and parametric modelling of built heritage, and subsequent enrichment with data related to asset contexts, environments, materials, and history. It contributes for structuring and enhancing the value of information for integrated, multidisciplinary, and cross-sectoral collaboration on built heritage preservation. It recommends the use of accessible open science practices, standards, services, and tools for streamlining the process of built heritage digitisation, modelling, and sharing.

    Requirements

    • Open science engagement (e.g., use of open standards and metadata schemas for improving data and models interoperability)
    • Accessible, open‑source services and tools

    Implementation

    Use of open standards and testing of free tools and services for built heritage advanced digitisation, integrating reality capture, scan-to-BIM, and multidisciplinary structured information aggregation. It includes data FAIRness (Findable, Accessible, Interoperable, Reusable) enhancement, geometric modelling and enrichment, and consolidated digital representation of built heritage.

    More info
  • Acquiring Images for Automatic Text Recognition

    Whether photographed in an archive or discovered in the digital collections of memory institutions like libraries or museums, for many humanities scholars, images of text sources form the basis of their academic work. Finding, creating and collecting images of textual material is often the first step in the research process.

    More info
  • Automatic Image Annotation Workflow

    Steps to fine-tune pre-trained image recognition model for domain-specific applications

    This workflow outlines a process for fine-tuning a pre-trained image recognition model to enhance its ability to recognize specific object categories that are underrepresented or entirely absent in its original training dataset. The primary goal is to create a lightweight machine learning (ML) model capable of annotating images using terms from domain-specific controlled vocabularies. This facilitates more accurate and consistent image annotation in specialized contexts.

    The workflow serves the dual purpose of improving the model’s performance on domain-specific data and streamlining the image annotation process. By iteratively combining manual annotation, automated annotation using the fine-tuned model, and model re-training, the workflow supports efficient creation of high-quality annotations, even for large and complex datasets.

    Key methods implemented include:

    • Data preparation and harmonization, division into training and testing datasets.
    • Fine-tuning: Re-training a pre-trained ML model on the manually annotated dataset.
    • Automated annotation: Using the fine-tuned model to annotate unlabeled images.
    • Verification and iteration: Refining the model by verifying and correcting its outputs, followed by additional training rounds.

    The workflow follows an iterative cycle: starting with manual annotations, training the model, applying it to unlabeled data, validating its outputs, and re-training to achieve incremental improvements. This approach is applicable to any general-purpose image recognition model and enables domain-specific adaptation of image recognition models, making them effective tools for tasks like object classification and metadata generation in specialized fields.

    Requirements

    • The resulting model is lightweight and versatile, i.e. can be used in various places, applications and workflows.
    • The annotated dataset can be used to further fine-tune the model, as the amount of annotated images grows. The ground-truth dataset is in a widely used format and structured according to good practice in the field.
    • Image metadata is enriched with terms from domain-specific controlled vocabularies enhancing their general FAIRness.
      • It is possible to find images according to their contents described using controlled vocabulary terms.
      • Interoperability is increased by using shared controlled vocabularies and formal mappings between various vocabularies.
      • Re-usability of the data is enhanced because the contents of images are formally annotated.

    Use-case

    In case of the ATRIUM project sub-task Automatic Image Recognition, the goal is to annotate images from archaeological archives with terms from domain-specific controlled vocabularies, e.g. types of artifacts present in the photographs, types of objects etc. This greatly improves usability of archaeological image archives, as metadata of the images are enriched with specific terms from controlled vocabularies, findability of images with specific contents is enhanced, and metadata description of photographs is simplified and automated.

    Two types of images are used in the process:

    1. photographs of (mostly) single finds (artefacts) photographed on (often) standardized backgrounds with a scale,
    2. archival (legacy) photographs with various contents, mostly photographs of fieldwork and archaeological objects (trenches, burials etc.)

    The intended outcome is two-fold, firstly process vast amounts of archival archaeological images and thus improve their metadata descriptions in the Archaeological Map of the Czech Republic (AMCR) repository and discovery service (parts of the Archaeological Information System of the Czech Republic (AIS CR) ) and in the ARIADNE Knowledge Base and Portal , and secondly, automate the metadata description of photographs submitted to the AMCR repository by metal detectorists through the AMCR-PAS portal.

    More info
  • Automatic Text Recognition Roadmap

    Automatic Text Recognition (ATR) uses Artificial Intelligence (AI), in particular machine learning (ML), to extract text from a scanned image. It encompasses two main techniques: Optical Character Recognition (OCR), extracting text from printed documents, and Handwritten Text Recognition (HTR), extracting text from manusripts.
    This workflow presents the main steps of an ATR workflow and how to integrate it in your research project.

    More info
  • Automatic Text Recognition using Object Detection with eScriptorium

    Please note that this workflow is bound to change once the new version of eScriptorium is released.

    This workflow performs Automatic Text Recognition (ATR) with an object detection approach. It uses a YOLO object detection model for segmenting regions, the kraken library for line segmentation, and performs text recognition on the eScriptorium platform. Please note that some prior knowledge in the application of these tools is expected, like using a command-line interface or having a free Roboflow account.

    More info
  • Collaborative Reusable Map Annotation

    Historical maps are valuable resources for humanities research, offering insights into past landscapes and patterns of human activity. Maps are more than images; they represent spatial and cultural relationships that may be anchored to the physical world, and often embed narratives of power, identity, and change, such as shifts in place names or symbolic representations. They vary widely in form, type, content, level of detail, and granularity of the information they present. At the same time they are material objects, cultural artifacts, with distinct physical characteristics and complex biographies of their own. Historical maps also come with challenges: they often lack modern geographic precision, may contain outdated or biased perspectives (which are nonetheless of historical significance), or show signs of physical wear, among other issues. Moreover, access to the information they contain is often limited, as searching through map collection catalogues typically relies on record metadata (i.e., basic information, such as title, creator, or date) and not the full content of the map. Map annotation provides a means to address these issues by enabling scholars to encode features, comment on the content, link place names to gazetteers or other authority files, transcribe text on maps, document historical or cartographic context, etc. Apart from supporting scholarly analysis, this process can transform static maps into searchable and machine-readable resources, by enriching the map metadata and allowing text search indices (Rainer et al. 2019). Developed within the ATRIUM project, this workflow supports collaborative annotation and geo-tagging of historical maps, transforming them into rich multilayer resources. It addresses both technical interoperability and data reusability, with outcomes that can enhance catalog records, improve search capabilities, feed into analytical pipelines, and support data visualizations, or even enable the creation of critical editions of maps. While primarily tested in Recogito Studio , the workflow is designed as tool-agnostic, ensuring flexibility and reusability across different tools, projects and teams. Its steps are not strictly linear; annotation is an iterative process and teams may revisit earlier phases (e.g., revising vocabularies during annotation) or run tasks in parallel or in alternation (e.g., linking to vocabularies and gazetteers). This iterative design allows teams to adapt to evolving insights and technical constraints. Note: This workflow is intended for manual annotation.

    References:

    More info
  • IIIF Enablement and Visualization Workflow Using Altamira Viewer and Cantaloupe for Image Repositories

    Introduction

    The purpose of this workflow is to provide a way to create and serve files in IIIF-compatible (International Image Interoperability Framework) format, which:

    • Increase interoperability by standardizing access, as it is open source and widely implemented

    • Enhance delivery time by sending to the end-user only the required parts of the image

    • Increase scalability as IIIF is ground-designed to handle large volumes of data

    The main goal of this workflow is to streamline the creation process of IIIF-enabled resources. The entire procedure should be as simple as possible. Moreover, saved data should be accessible at will by any user having a proper resource link.

    Workflow Goals

    • Enable users to save images (in a given subset of formats) that will be properly converted and saved to the IIIF Server
    • Enable users to access saved images at will by IIIF Viewer

    Workflow implementation

    Full documentation available here

    More info
  • LLM-Powered Mapping of Keywords of a Research Article to Linked Data

    Suppose you have a research article which is written in a language other than English and you have a set of keywords for that article. How could you grasp the main topics of the article from the keywords? First, you would like to translate them, but this is not a trivial task since words can be ambiguous, and finding the exact sense requires an understanding of the whole article! Moreover, suppose you want to link your article to other resources, for example other articles that share the same topic. Now suppose you want to make your resource accessible so that people can find keywords in different languages. This workflow provides a solution to all this scenarios, since it gives you a way to map keywords to linked data (Wikidata and DBPedia entities).

    More info
  • Making Data Reusable in the Context of Automatic Text Recognition

    The most common reason behind an export is to create backups of the data because digital tools and servers are not 100% reliable. Export can be done even with unfinished transcriptions when your project is still considered work in progress.

    In the cases where it is possible to change software, you will want to move your data from the current software to the one you want to migrate to. Similarly, by exporting the transcription, you can feed the data to another tool, which can be one of the reuse options. Exporting can also be done to publish finished transcriptions, whether it is the whole corpus or simply a sample. Finally, if you want to transform your transcribed corpus, an export will be necessary.

    More info
  • Model Training for Automatic Text Recognition

    Automatic Text Recognition (ATR) describes the convergent usage of Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR).

    OCR is aimed at treating printed media, such as books. This technology has only improved over time, and systems nowadays can have an accuracy of 99% in recognising text from printed documents. HTR, as the name suggests, is used for handwritten documents, improve their accessibility, and allow for efficient data extraction. Human handwriting poses a challenge even for modern technology, as manuscripts encompass a near-endless range of fonts and styles. It is out of range for traditional approaches of pattern matching, which is the crux of OCR.

    Evolving technologies have allowed for text recognition systems to now to do both HTR and OCR simultaneously. Modern models can recognise printed as well as handwritten text, and process segments in lines or entire paragraphs. These technologies are especially appealing to historians, librarians, archival staff, and scholars in general who want to convert, analyse, store, or maintain their documents in a novel way, adopting global digital trends.

    More info
  • Named entity recognition from transcribed spoken data

    This workflow is designed to systematically extract structured information from unstructured text. Its primary function is to process transcribed free-text descriptions of archaeological findings and identify relevant named entities. The workflow receives transcribed text from archaeological context sheets as its input and extracts entities related to the archaeology domain. It returns structured metadata that can be used for indexing an archaelogical record. The principal objective of this workflow is to enrich the archaeological records by converting unstructured descriptive text into structured, machine-readable metadata.

    More info
  • Ontology driven Information Extraction from Text

    Description

    This Workflow provides an automated pipeline for information extraction from textual data. It employs various machine-learning (ML) and rule-based methods for the extraction of entities, relations, and metadata information from published text. The entire process is ontology-driven, meaning that all the semantic definitions of the populated entity / relationship types are provided by an ontology specifically designed for the task at hand and can function as the schema of a potential Knowledge Graph comprising the extracted and interrelated entities.

    Use case implementation

    The workflow is implemented for a specific use case scenario, that of extracting information regarding scholarly work from research publications. Specifically - as part of the ATRIUM Project - the implemented workflow allows to employ Deep Learning (DL) methods for automatic extraction of textual spans denoting research activities and steps thereof as well as other expressions that describe either the intention (goal) of a specific activity, or the way (method) it was carried out. These entities, along with additional information extracted from article metadata (author keywords, publication information) and semantic relationships among them, provide the building blocks for the construction of a knowledge graph that describes work processes. All the semantic definitions of the populated entity / relationship types are provided by the Scholarly Ontology (SO) , a CIDOC-CRM compatible conceptual framework, specifically designed for documenting scholarly work.

    Goals

    • Extract textual spans representing instances of for the Scholarly Ontology (SO) classes: Activity (i.e. a scholarly process like an archeological excavation, a social study or steps thereof, etc.), Goal (i.e. a research objective of an activity denoting why the latter was conducted), and Method (i.e. a procedure, plan or technique, employed by an activity and denoting how the latter was conducted).
    • Disambiguate / link the extracted entities to external reference resources (like Wikipedia) when possible.
    • Interrelate the extracted entities using properties and relationships provided by the SO, such as: employs(Activity,Method) and hasObjective(Activity,Goal).

    Technical requirements

    • Fine-tuned Deep Learning (DL) models using the spaCy NLP framework . The implemented workflow uses - for demonstration purposes - pretrained Transformer models that are downloaded from the Hugging Face Library and are already fine-tuned for recognition of textual spans representing research activities along with their goals and research methods. However, other models could be used interchangeably as long as they are compatible with the SpaCy pipeline.
    • A framework for performing entity disambiguation and linking. The implemented workflow uses the Zshot framework for entity disambiguation of the extracted methods’ names by linking them with their corresponding Wikipedia entities. In addition, the ORCID API , is used in order to link the names of the authors with their corresponding ORCID information (ID, email, affiliations and full name) when possible and Wikipedia, Wikidata and DBPedia APIs are being accessed through linking-information queries in order to retrieve the methods’ description, proper and alternate names (aliases) and corresponding URLs.
    More info
  • Performing Layout Analysis for Automatic Text Recognition

    Layout analysis is a computer vision tasks that allows us to identify the structure of a document and localize the lines of text. We generate a series of x and y coordinates corresponding to regions/lines on the image, which we associate to a corresponding labels.

    Text line segmentation is mandatory in most ATR systems, but without a good line segmentation, transcription performances will not be sufficient. For example, lines can be missed, read in the wrong order, or even broken down into several lines.

    Synonyms: “segmentation”, “zoning”, “document analysis”, “optical layout analysis”.

    More info
  • Semi-automated extraction of information from textual documents in a domain-specific repository

    The workflow addresses the machine processing and enhancement of archaeological textual data (e.g., grey literature, reports, field forms) derived from image text recognition outputs or extracted from digital-born PDFs. The goal is to provide an integrated solution that facilitates the processing of legacy data and new uploads within the selected repository system. It aims to enhance searching and processing of documents while streamlining archival procedures by automating key steps in common archiving and annotation workflows.

    By applying this workflow, multiple downstream applications become possible, including:

    • automatic linking of related documents,
    • quality assessment,
    • data extraction,
    • interlinking with other documents,
    • and automated abstract generation.

    These capabilities contribute to the long-term preservation and accessibility of archaeological documentary archives. Furthermore, the workflow supports natural language processing (NLP) applications, enabling the creation of corpora for analysis using large language models (LLMs) and their training.

    The workflow emphasizes the sustainable integration of existing tools accessible via APIs as services. It leverages the outputs of these tools to meet specific user needs, particularly within data archiving and publishing workflows, ensuring adaptability and scalability in diverse use cases.

    The Archaeological Map of the Czech Republic (AMCR) repository serves as a demonstrator in the workflow.

    More info
  • Using GoTriple data for your SSH data science tasks

    This workflow explains how to use GoTriple data on Social Sciences and Humanities (SSH) for data analytics tasks.

    GoTriple is the discovery platform for the SSH: it currently indexes metadata of almost 19,5 million documents, over 25,000 projects and about 22,5 million author profiles. The extreme diversity of SSH is well represented, in its multivaried aspects, in the platform: on the one hand, GoTriple’s documents cover 27 disciplines within SSH, from History to Literature, from Management to Gender Studies.

    Moreover GoTriple is inherently multilingual, indexing documents in over 20 languages but also providing its user interface localised in 10 European languages. Finally, diversity also applies to the nature of its documents, with a high representation of textual content (articles, thesis, reports, books,…) but also a significant amount of datasets, multimedia assets and software artefacts with their number expected to increase in the coming months.

    This workflow explains in practice, with code examples, how GoTriple data can be used by Digital Humanists for data analytics tasks.

    There are two ways to do so:

    • by using the data dumps created and published in Zenodo (last update May 18, 2025);
    • by using GoTriple APIs, to extract data from the online platform.

    What to use: If you just need a subset of GoTriple publications that have a link to their PDF, the data dumps are the quickest way to get you started. Also this is the suggested way if you don’t want to write code to get GoTriple data.

    On the other hand, use the APIs if: a) you need access to the most recent version of the data; b) you need to include publications without a full-text; c) if you need access to a very specific subset of the data and you want to apply custom filtering conditions.

    Workflow Code A Jupyter Notebook with ready-to-use examples is available and it is the base for this workflow [go‑triple_stats‑with‑zenodo.ipynb[( https://github.com/odoma-ch/gotriple-data-utils/blob/main/gotriple_stats-with-zenodo.ipynb ). This Jupyter notebook provides hands-on examples of how to process GoTriple data dumps and integrate analytics workflows via Python or similar tooling. Familiarize yourself with its structure, required dependencies (e.g., pandas, requests), and typical processing pipelines (such as metadata loading, parsing, filtering, aggregation, and visualization). Also, familiarize yourself with GoTriple data model, which is fully described in the TRIPLE Project deliverable D2.5: https://zenodo.org/records/7359654 .

    More info
  • Workflow for Annotating and Comparing IIIF resources using Altamira

    This workflow enables researchers and members of the public to visually annotate images from various providers that support the International Image Interoperability Framework (IIIF). It facilitates the comparison and integration of annotations across different image sources, allowing users to share their annotations for consultation, review, or collaborative editing.

    The solution is particularly valuable in cases where images of a single artifact type are dispersed across multiple providers, and where a streamlined method for sharing annotations is preferred.

    Fully compatible with all IIIF-compliant resources, the workflow integrates Altamira, a customized implementation of the Mirador viewer. Altamira has been specifically developed to support the preparation, management, preservation, sharing, and reuse of annotations in accordance with the Web Annotation Data Model.

    Altamira is available as a service at the Archaeology Data Service: link

    Implementation

    This workflow’s functionality relies on both existing IIIF resource sources and the integration of Altamira. Both are indispensable for its operation. Fortunately, these are already deployed across various platforms. If you intend to share your own IIIF resources or utilize a self-managed Altamira instance, please follow the steps detailed in IIIF Enablement and Visualization Workflow Using Altamira Viewer for Image Repositories .

    Prerequisite

    Altamira can be deployed independently by following IIIF Enablement and Visualization Workflow Using Altamira Viewer for Image Repositories , or users may utilize the version deployed as a service by the Archaeology Data Service.

    Detailed documentation

    Detailed documentation with images is available here !

    More info