
CAA 2025 – Reusable Digital Research Workflows for Archaeology
Notes contributed by Massimilano Carloni and edited by Megan Black

© Prisma Cultura - Brayan Padilla
The field of archaeology has undergone a profound transformation in recent decades, thanks to the integration of digital technologies. These advancements have enhanced both the scope and precision of research in the Humanities and Heritage sectors, allowing scholars to apply complex methodologies to increasingly large and diverse datasets. More recently, the adoption of machine learning and artificial intelligence has not only improved the quantity of data researchers can process, but also the quality of insights derived from it. With this digital acceleration comes a new focus: workflows. These structured methods for managing and processing data are at the heart of how digital research operates today.
ATRIUM was delighted to explore the importance of workflows further at the 2025 Conference of Computer Applications and Quantitative Methods in Archaeology (CAA) in Session 19: “Reusable Digital Research Workflows for Archaeology” on May 7th 2025. Bringing together researchers working on digital workflows in archaeology, with contributions from both within and outside the project, the session was jointly organised by DARIAH and ARIADNE, and facilitated by Agiatis Benardou (DARIAH), Émilie Pagé-Perron (ARIADNE), Anne Baillot (DARIAH), and Julian Richards (ARIADNE).
Below, we share key insights from the nine presentations delivered during the event hosted in the main auditorium.

Megan Black (ATRIUM Project Manager) and Emilie Page-Perron (ADS) © Prisma Cultura - Brayan Padilla
Part 1: 8:30–10:30
Moderated by Émilie Pagé-Perron (UdY_ADS)
Paper 1: Leveraging AI for Enhanced Archaeological Data Extraction: Workflows for Textual and Image-Based Data
Petr Pajdla (Speaker), David Novák, Ronald Harasim, Dana Křivánková, Pavel Straňák, Kateryna Lutsai, Olga Lečbychová
The Institutes of Archaeology of the Czech Academy of Sciences in Brno and Prague ( ARUB and ARUP ) are tackling the monumental task in ATRIUM of processing over 1.5 million pages and 500,000 photographs from archaeological archives. Manual processing of such material is not only time-consuming but prone to error, making it an ideal candidate for automation.
To support this, two workflows are under development: one for textual data and another for images. The image-based workflow was the focus of the presentation, emphasizing the need to define clear goals and use cases before selecting appropriate models: be it large language models or more traditional computer vision approaches. Images are first harmonized and aligned with controlled vocabularies before being annotated using platforms like CVAT. The team uses segmentation (rather than bounding boxes) to train models such as YOLO, paying close attention to data imbalances and validation splits. The iterative nature of the workflow allows continuous improvement through retraining.
Next steps include expanding the annotated dataset and developing better ground truth data, especially for images with different viewing angles.
View the full presentation on Zenodo .
Paper 2: Digital Archaeology Data Archives as a Source of Creative Inspiration
Rimvydas Lauzikas (Speaker), Indre Jovaisaite-Blazeviciene, Ingrida Kelpsiene, Andrius Suminas
How can archaeological archives adapt to broader audiences? This is the main focus of the TETRARCHs project, which explores how the different audiences of creative professionals, educators, and archaeologists engage with archaeological archives. While archaeologists prioritise factual accuracy for reconstruction, creatives are often more interested in interpretive, visual content that inspires storytelling, design, and artistic work.
Through interviews, eye-tracking experiments, and questionnaires, the project discovered that creatives are less interested in precise typological or textual data and more in evocative imagery and interpretive content. Rather than attempting to reformulate professional archives entirely, the suggestion was to create new interfaces tailored to the needs of creative users. Such interfaces could draw on the same datasets but frame them in more intuitive and visually accessible ways.
Paper 3: Advancing Reusable Digital Research Workflows for Built Heritage Conservation through Building Information Modelling
Kristis Alexandrou (Speaker), Maria João Correia, Valentina Vassallo, Georgios Artopoulos, António Santos Silva, Sílvia Pereira
ATRIUM partners Cyprus Institute and National Laboratory for Civil Engineering focused on advancing reusable digital research workflows for the conservation of built heritage through Building Information Modelling (BIM). This approach addresses the complex challenge of integrating diverse digital file formats and disciplinary languages across fields such as archaeology, architecture, conservation, urban studies, and the natural sciences.
The proposed 3D-HBIM workflow offers a structured method for producing detailed geometric and parametric models of heritage assets, which are then enriched with contextual, environmental, material, and historical data. Developed through case studies in Cyprus and Portugal, the workflow highlights the importance of cross-sector collaboration and open science practices. It includes best practices for planning, data exchange, and project management, and supports interoperability through adherence to international standards such as ISO 19650 and CIDOC CRM. By incorporating multidisciplinary expertise, from sensing technologies to environmental monitoring and historical analysis, the workflow facilitates conservation efforts, restoration planning, and sustainable data use. Despite its strengths, the workflow must contend with ongoing challenges of sustainability and scalability, particularly as it moves toward applications in City Information Modelling (CIM) and broader research infrastructures.
Paper 4: 3D-Based Workflows for Archaeology and Built Heritage: The Case Study of the UNESCO World Heritage Site of the Ayios Ioannis Lampadistis Monastery, Cyprus
Anastasia Tsagka (Speaker), Valentina Vassallo (Speaker), Sorin Hermon
The team at the Cyprus Institute focused on creating a reusable digital workflow for documenting the UNESCO-listed Ayios Ioannis Lampadistis Monastery. The monastery, known for its Byzantine architecture, historic frescoes, and ancient graffiti, remains an active religious site, which adds both cultural value and vulnerability due to ongoing visitor traffic.
The team proposed a collaborative digital workflow designed to be continuously updated, published online, and integrated into larger discovery portals and research infrastructures. This process began with defining project goals and determining relevant sources and software. It involved thorough documentation, data processing, and the development of an Asset Information Model (AIM), which was designed for export in a neutral format to support interoperability. The final model was made accessible through web-based platforms such as the INCEPTION Core Engine, with open-source alternatives like 3D HOP and Blender also under consideration to ensure broader accessibility. The project’s integration with infrastructures like ARIADNE highlights its potential for wider dissemination and scholarly engagement. Looking ahead, the aim is to develop a reusable, multidisciplinary methodology that can be applied to other heritage sites and contribute to initiatives like the European Collaborative Cloud for Cultural Heritage (ECCCH).
Paper 5: Creating a Knowledge Base of Research Methods from Archaeology Publications
Vayianos Pertsas (Speaker), Nikolaos Kapralos, Ioanna Ntountoudi
ATRIUM partners ATHENA and the Athens University of Economics and Business introduced a digital workflow for creating a structured knowledge base of research methods drawn from archaeological publications. Using machine learning and linked data, the team developed a knowledge graph to map how methods are applied across studies. This not only aids discovery but also allows users to query the graph with Cypher for deeper analysis of scholarly trends. By automating and structuring the extraction of methodological knowledge, the project offers a powerful tool for exploring trends and practices in archaeological research and sets the stage for a more searchable and interoperable scholarly landscape.
The workflow involved a modular digital architecture that combines fine-tuned deep learning models for entity extraction, a large language model (LLM)-based approach for disambiguation, and rule-based post-processing. The system processes sentences from article abstracts and main texts, identifying and labelling spans related to research methods. Using a pre-trained transformer model (RoBERTa base) and a transition-based parser, the pipeline extracts, normalises, and disambiguates method references by linking them to Wikipedia, Wikidata, and DBpedia. These annotated data are then converted into structured entries for a Neo4j knowledge base, following a consistent schema and indexing strategy to support efficient retrieval.
Part 2: 10:45-12:30
Moderated by Agiatis Bernadou (DARIAH)

© Prisma Cultura - Brayan Padilla
Paper 6: Toward a Standardised Workflow for the Documentation of Archaeological Research Projects
Rita Gautschy (Speaker), Noémi Villars-Amberg (Speaker)
How can we not only better document archaeological data but also make it more interoperable, reusable, and aligned with best practices in digital scholarship? Aiming to improve the transparency, accessibility, and long-term usability of archaeological research data, the team at Swiss National Data and Service Centre for the Humanities presented efforts toward developing a standardised workflow for documenting archaeological research projects.
A key aspect of the approach is enhancing data discoverability by mapping diverse project-specific data models to established ontologies such as CIDOC-CRM, EDM, and Nomisma. The proposed “archaeology standard data model” offers a predefined structure that is flexible enough to be adapted by individual research projects to suit their needs. This includes the reuse and adaptation of controlled vocabularies for elements like chronologies and materials. To support this, a structured workflow is being developed for research data specialists. This involves assessing project-specific data, adapting standard models, customising import scripts, and guiding researchers through the documentation process.
Paper 7: Towards a Collaborative Map Annotation Workflow: Annotating Ancient Places on Rigas’ Charta of Greece
Maria Ilvanidou (Speaker), Massimiliano Carloni (Speaker), Anna Aslanoglou, Vicky Dritsou
Historical maps are complex, multidimensional artefacts whose layered histories and narrative potential are often obscured. Annotation is proposed as a means to unlock these hidden layers of meaning and make the maps more analytically accessible.
ATRIUM Teams from ATHENA and the Austrian Academy of Sciences are creating a workflow for manual map annotation, tested using Recogito Studio but intentionally designed to be tool-agnostic. Using the case study of the 18th-century Charta of Greece, created by Greek writer and revolutionary Rigas Velestinlis, the richly illustrated Enlightenment-era map merges historical and mythological geography as a political call to resist Ottoman rule.
The workflow focuses on manual, collaborative annotation of features such as historical events and ancient coins, which are linked to external sources and geotagged for deeper contextualisation.
Paper 8: The Infrastructure and Workflows of NFDI4Objects
Fabian Fricke (Speaker), Christin Keller (Speaker)
NFDI4Objects , part of Germany’s National Research Data Infrastructure (NFDI), is developing workflows that support researchers dealing with three million years of human and environmental history. Through tools like Cocoda, Dante, and Bartoc, the initiative supports data mapping, building, and publishing, with a strong emphasis on interoperability and sustainability. A knowledge graph and triple store are already in use, offering early access to linked datasets.
At the heart of NFDI4Objects’ workflows related to research data is the Knowledge Graph , a graph database based on Neo4J. Currently, data ingestion is facilitated through LIDO and CIDOC-CRM, with queries possible via SPARQL or Cypher APIs. The DANTE service serves as the central hub for terminologies and ontologies, while BARTOC provides a registry for these resources to increase their accessibility.
Paper 9: Go with the Flow – Workflows as a Recipe for Reproducible Results
Ceri Binding (Speaker), Douglas Tudhope
The ATRIUM team at the University of South Wales addressed the ongoing reproducibility crisis in academia, where many published studies fail to include the necessary detail for others to replicate their findings. Yet publicly funded work should be publicly available: not only the results but also the datasets and detailed methodologies. In response to this challenge, the presentation proposed a practical and accessible approach: framing digital research workflows as recipes.
Just like a recipe, a workflow can be broken down into ingredients (such as input data and configurations), equipment (including hardware, software, and services), methods (the step-by-step procedures), and results (outputs like processed data and reports). This model was illustrated through a vocabulary-driven Named Entity Recognition (NER) workflow built with the spaCy library. Key components of the workflow include text normalization, lemmatization, part-of-speech tagging, and output rendering in various formats like HTML. The workflow is designed to be modular, asynchronous, and adaptable, allowing researchers to reuse or substitute parts of the process based on their specific needs. By framing workflows in this structured, recipe-like format, the project aims to make complex computational methods more transparent, shareable, and ultimately reproducible, ensuring that publicly funded research remains accessible, verifiable, and open to future innovation.
Reflection
The ATRIUM session brought to the forefront the rich diversity and innovation within workflow-based research in archaeology and cultural heritage. Across the two parts of the session, contributors demonstrated how workflows, ranging from AI-driven image annotation and Named Entity Recognition, to 3D modelling, historical map annotation, and ontology-based data integration, are not merely technical procedures but foundational elements of research methodology. Workflows serve as both methodological roadmaps and tools for ensuring transparency, reproducibility, and cross-disciplinary collaboration.
What emerged clearly from the session is that workflows are much more than step-by-step guides. They function as frameworks that enable interoperability, support creative adaptation, and foster sustained research engagement beyond the original project boundaries. Workflows help researchers make sense of increasingly complex and heterogeneous data environments, from massive image archives to intricate built heritage documentation.
A key insight from the session is the critical role that infrastructures play in sustaining and promoting workflow-based research. By bridging multiple research infrastructures and engaging both internal and external collaborators, ATRIUM exemplifies how infrastructures can catalyse meaningful knowledge exchange. In particular, the SSH Open Marketplace is a crucial platform for making workflows visible, shareable, and adaptable, ensuring that valuable methodological knowledge is not lost but passed on and improved.
However, there remain systemic challenges. Workflows are often not formally recognised as scholarly outputs, despite their substantial intellectual and practical contributions. Academics are typically not rewarded for publishing workflows, which creates a disincentive for making this knowledge openly available. Research infrastructures have a vital role to play here – not just in hosting and disseminating workflows, but in advocating for their recognition as first-class research outputs.
Ultimately, while tools, data, and services are essential, they are only truly effective when embedded within clearly articulated workflows. Without such structured guidance, researchers may struggle to understand how to integrate resources into their own work. The ATRIUM session demonstrated that well-documented, open, and reusable workflows are not only technically beneficial but central to advancing digital archaeology and cultural heritage research in an open, transparent, and collaborative way.

