Skip to main content

Service & Software Catalogue

The ATRIUM project promotes a comprehensive set of services & tools designed to support all the stages in the digitally enabled data-driven research in the Arts and Humanities. This curated catalogue features state-of-the-art resources for creating, processing, analyzing, preserving, and reusing digital data across various disciplines, from archaeology to linguistics.

Explore the services and tools – whether you’re managing textual corpora, analyzing geospatial data, transcribing spoken field notes, or engaging in other research activities.

Discover the ATRIUM Catalogue on SSH Open Marketplace

Our list of tools and services is hosted on the SSH Open Marketplace, a comprehensive discovery portal that collects and contextualizes resources for Arts and Humanities research communities. By clicking on the titles of the tools and services, you will be taken directly to their entries on the SSH Open Marketplace where you can further explore detailed information, training materials, datasets, publications, and workflows associated with each tool. The platform not only helps you find what you need but also connects you with related publications and case studies, providing valuable context and experimenting practical applications to improve your research.

You can also explore the full list of tools and services of the ATRIUM catalogue on the SSH Open Marketplace, by simply entering the keyword “ATRIUM catalogue” into the search box on the homepage. This search will direct you to the complete list of tools and services, as well as workflows created using them.

49 software & services

  • 3DHOP: 3D Heritage Online Presenter

    3DHOP (3D Heritage Online Presenter) is an open-source framework for the creation of interactive Web presentations of high-resolution 3D models, oriented to the Cultural Heritage field. 3DHOP target audience ranges from the museum curators with some IT experience to the experienced Web designers who want to embed 3D contents in their creations, from students in the CH field to small companies developing web applications for museum and CH institutions.

    3DHOP allows the creation of interactive visualization of 3D models directly inside a standard web page, just by adding some HTML and JavaScript components in the web page source code. The 3D scene and the user interaction can be easily configured using a simple “declarative programming” approach. By using a multi-resolution 3D model management 3DHOP is able to work with high-resolution 3D models (hundreds millions of triangles/points) with ease, also on low-bandwidth. 3DHOP does not need a specialized server, nor server-side computation: simply some space on a web server, and does work directly inside modern web browsers, no plug-ins or additional components are necessary.

    More info
  • 3M - Mapping Memory Manager

    Mapping Memory Manager (3M) is a free online application developed by the ICS-FORTH team (Heraklion, Greece). 3M allows the matching of data in XML format with ontologies, in particular CIDOC CRM and its extensions, in order to bring these data to the Semantic Web. This mapping enables data to be exported in standard formats adapted to the Semantic Web, such as Turtle, RDFS and N-Triple. To achieve this goal, 3M uses two main languages:

    • XPath, for exploring an XML tree to retrieve, select and return data ;
    • X3ML, for using triplet templates to structure XML data file in output XML file in Turtle, RDFS or N-Triple format.
    More info
  • AMCR Digital Archive

    The Digital Archive of the Archaeological Map of the Czech Republic (AMCR) is a web application designed for viewing information about archaeological investigations, sites and finds, operated by the Archaeological Institutes of the CAS in Prague and Brno. The archives of these institutions contain documentation of archaeological fieldwork on the territory of the Czech Republic from 1919 to the present day, and they continue to enrich their collections. The AMCR database and related documents form the largest collection of archaeological data concerning the Czech Republic and are therefore an important part of our cultural heritage.

    More info
  • ARCHE A Resource Centre for the HumanitiEs

    ARCHE (A Resource Centre for the HumanitiEs) is a service that offers stable and persistent hosting as well as the dissemination of digital research data and resources for the Austrian humanities community. ARCHE welcomes data from all humanities fields.

    If you have resources, would like to share them, and want to make sure that the data will be around in the future, contact us. We offer both archiving and online availability for your resources. If needed, we will assist you in converting data and metadata into required formats.

    ARCHE’s primary mission is to provide easy and sustainable access to digital research data and resources for researchers in the humanities. To this end, it takes care of long term preservation of digital data, and promotes the use of open access and open data policies. Have a look at our Collection Policy to find out about the scope of ARCHE.

    To support researchers in preparing their data for long term preservation, ARCHE offers a set of policies and standards as well as consulting.

    As the successor of the Language Resources Portal (LRP), ARCHE is one of Austria’s connection points to the European network of CLARIN Centres.

    ARCHE has embraced the FAIR Data Principles providing Findable, Accessible, Interoperable and Reusable data and metadata.

    More info
  • ARIADNE Data Management Plan - Tool and Template

    ARIADNEplus offers an online Data Management Plan Tool which supports archaeological data managers and archaeologists who want (or are obliged to) make a data management plan for producing and sharing data that comply with the FAIR data principles (see: https://www.go-fair.org/fair-principles/ .

    A complication is that the requirements for a DMP, although similar in intention, vary across research funders and organisations. Therefore ARIADNEplus currently provides the tool with three DMP templates:

    • the Protocol for Archaeological Data Management, which is based on the widely recognised Science Europe guidance for research data management and the principle of “comply or explain”.
    • the ARIADNEplus DMP Researcher Template for Archaeological Datasets, developed from the Horizon 2020 requirements and tailored to community standards and practices:
    • the Horizon Europe Template for Archaeological Datasets which includes links to information and suggestions for answering the questions of the template; the related guide can also be consulted for the above protocol and template: https://training.ariadne-infrastructure.eu/dmp-guidance/

    Provider of the DMP Tool and Templates: PIN VastLab and Data Archiving and Networked Services (DANS)

    More info
  • ARIADNE EpHEMERA

    The Online 3D Database System for Endangered architectural and archaeological Heritage in the south Eastern MEediterRAnea area (EpHEMERA) is intended to serve as an infrastructure where it is possible to: Visualize online and through standard web browser 3D architectural and archaeological models classified according to a specific type of risk; Query the database system and retrieve metadata attached to each single virtual object; Extract geometric and morphological information.

    More info
  • ARIADNE Portal

    The ARIADNE Portal provides the main point of access for searching and browsing datasets and new services for processing and publishing archaeological datasets online.

    The Portal brings together existing archaeological research datasets from ARIADNE partners so that researchers can browse and access the various distributed datasets for use in their projects. The portal also provides a point of access for the new services developed by the project.

    The datasets that are registered in the ARIADNE catalogue are held by its partners and have been created through research, in excavations, in fieldwork, laboratory and other projects. In recent years archaeologists have been making increasing use of sophisticated digital equipment and techniques. During the course of a research project large volumes of data are created and collected, and become part of the research archive. ARIADNE aims to make these archives available through its portal for researchers to consult when starting new research.

    More info
  • ARIADNE Visual Media Service

    The Visual Media Service provides easy publication and presentation on the web of complex visual media assets. It is an automatic service that allows to upload visual media files on an server and to transform them into an efficient web format, making them ready for web-based visualization.

    More info
  • ARIADNE Vocabulary Matching Tool

    The Vocabulary Matching Tool allows users to align Linked Data vocabulary concepts with Getty Art & Architecture Thesaurus concepts. The tool is a browser based application that presents concepts from chosen source and target vocabularies side by side, exposing additional contextual evidence to allow the user to make an informed choice when deciding on potential mappings (expressed as SKOS mapping relationships). The tool is intended for vocabularies already expressed in RDF/SKOS and can work directly with the data – querying external SPARQL endpoints rather than storing any local copies of complete vocabularies. The set of mappings developed can be saved locally, reloaded and exported to a number of different output formats.

    More info
  • ARIADNEplus Linked Open Data (GraphDB)

    GraphDB provides researchers with access to the ARIADNEpPlus Knowledge Base, including all the partner data as a Linked Open Data set and modelled according to the ARIADNE ontology. With GraphDB, researchers can explore the knowledge base with the available web GUI or programmatically with SPARQL queries.

    More info
  • Annotorious

    A JavaScript library for image annotation, Annotorius is a Web-based image annotation tool, built specifically for integration with existing Web pages or portal environments.

    More info
  • DARIAH-CAMPUS

    Looking for learning resources? DARIAH-CAMPUS is a discovery framework and hosting platform for learning resources.

    More info
  • DARIAH-DE Geo-Browser

    The DARIAH-DE Geo-Browser (or GeoBrowser) allows a comparative visualisation of several requests and facilitates the representation of data and their visualisation in a correlation of geographic spatial relations at corresponding points of time and sequences. Thus, researchers can analyse space-time relations of data and collections of source material and simultaneously establish correlations between them.

    Support for this tool is availabe through the helpdesk: support@clariah.de

    This resource is supported by Text+. In case of questions you may get in touch with the Text+ helpdesk at textplus-support@gwdg.de .

    More info
  • Datasheet Editor

    Datasheet Editor is a user-friendly tool designed to facilitate the preparation and organization of data for visualization within the DARIAH-DE Geo-Browser. With seamless integration from various sources like Excel or CSV files, users can easily import and manipulate data, extract geocoordinates from place names, and securely store datasets in the DARIAH-DE Storage. Whether for research, academia, or personal projects, this editor streamlines the process of sharing spatial data with the world, offering a convenient solution for data visualization and collaboration.

    More info
  • Dendrochronology Entity Recognizer

    The Dendrochronology entity recognizer is available for English, Dutch and Swedish.

    The Dendrochronology entity recognizer identifies terms and phrases for analysing archaeological text. The pipeline delivers named entities of archaeological elements, wood material, sample, and date. It also annotates phrases with different weights based on the number of entities types that are distinctly contained. The named entities apart from the Date, are linked to concept labels of the AAT Getty thesaurus. The pipeline is part of the Ariadne Infrastructure that integrates archaeological research data across Europe.

    More info
  • Digital Libraries Federation - Federacją Bibliotek Cyfrowych

    Online collections of Polish cultural and scientific institutions with more than 8.4 million objects.

    Objectives and content: Polish Digital Libraries Federation is a database which aggregates data about on-line collections of Polish memory institutions. Currently the database contains information about over 4 million records from over a hundred of Polish on-line services. Each object is described with metadata record (extended Dublin Core) and a link to a website on which the object is accessible. The database offers flexible searching and filtering possibilities and can be accessed via API. Polish Digital Libraries Federation is the main Polish metadata aggregator for Europeana.

    Targeted public: Researchers and general public interested in cultural heritage collections from Polish libraries, museums and archives, which are available on-line.

    More info
  • Digital Repository of Scientific Institutes

    The Digital Repository of Scientific Institutes (RCIN), established in 2010 through collaborative efforts, serves as a comprehensive platform for digitizing, archiving, and disseminating scientific resources. With a mission to ensure enduring access to valuable scientific data, literature, and objects, RCIN operates under a consortium agreement, meticulously curating thematic collections aligned with partner institutions’ expertise.

    More info
  • GATE (General Architecture for Text Engineering)

    GATE (General Architecture for Text Engineering) is a Java language software framework and developer-focused user interface for building text analysis pipelines that can be embedded within other software systems.

    It includes (1) a Java API for implementing natural language processing components, (2) an extensive library of components written against this API that implement low level NLP algorithms and/or wrap existing third party NLP tools to act as building blocks for larger NLP applications, and (3) a pipeline abstraction and pattern-matching language (JAPE) to combine these building blocks into higher level applications tailored to specific tasks.

    The completed pipeline can then be embedded in other user-facing software systems or APIs.

    More info
  • GROBID

    GROBID (or Grobid, but not GroBid nor GroBiD) means GeneRation Of BIbliographic Data.

    GROBID is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications.

    More info
  • Gate Cloud

    https://cloud.gate.ac.uk is a public REST API providing access to a wide variety of GATE text analysis pipelines and other services for use by third party software. A simple web interface is available to test individual pipelines on small samples of text, and API documentation is provided for use by application integrators. A tool is also available to run GATE Cloud services against text stored in Google Sheets.

    Available pipelines cover a variety of tool types, levels of abstraction and target domains, ranging from simple low-level text analysis such as part-of-speech tagging, general named entity taggers for a variety of languages, domain-specific entity recognition services for archaeology & dendrochronology, various classifiers to assist with analysis of disinformation and hate speech, and an OCR service to extract text from images. A basic quota sufficient to analyse ~1200 texts per day is available to anyone, with higher quotas available on request to research users.

    More info
  • GoTriple Discovery Platform

    The GoTriple platform is an innovative multilingual and multicultural discovery solution for the social sciences and humanities (SSH). It will provide a single access point that allows researchers to explore, find, access and reuse materials such as literature, data, projects and researcher profiles at a European scale. Conceived as an entry point to the EOSC, it is one of the dedicated services of OPERAS, the Research Infrastructure supporting Open Scholarly Communication in the SSH in the European Research Area.

    More info
  • ILSP Neural NLP Toolkit

    The ILSP Neural NLP toolkit for Greek currently integrates modules for segmentation, POS tagging, lemmatization, dependency parsing, chunking, named entity recognition and text classification. The toolkit is based on language resources including web crawled corpora, word embeddings, large lexica, and corpora manually annotated at different levels of linguistic analysis.

    You can use this service to process any TXT document in UTF-8 encoding and obtain the output in either JSON format or in CSV file format.

    According to the processing steps followed by the service, the output includes annotations on the level of sentence (Sentence ID, Sentence), token (Token ID, Token, Token Start, Token End), lemma, part of speech (Upos and Features according to Universal PoS & Features Tagset, Xpos according to XPoS ILSP Tagset), dependency relation (Relation and Head ID) according to Universal Dependency Relations Tagset, named entity (Named Entity, Named Entity Type according to ILSP Neural NER Tagset, Named Entity Score), chunks (Chunk and Chunk Type according to ILSP Neural Toolkit Chunk Tagset) and the thematic category in which the whole text is classified into (Text Classification according to ILSP Neural Text Classification Tagset). See the Output Annotation Schema field below for more information on the respective tagsets.

    For technical reasons, the file submitted for processing may be split into smaller files, and so will the output.

    This service is seamlessly integrated into the CLARIN:EL infrastructure’s workflows, providing users with convenient and easy access as a web service. You can use this service by clicking the “Use” button on this page or by accessing it through the CLARIN:EL processing services page. Read less Prokopidis, Prokopis (2021, July 18). ILSP Neural NLP Toolkit. Version 3.0.1. [Software (Tool/Service)]. CLARIN:EL. http://hdl.handle.net/11500/CLARIN-EL-0000-0000-67B2-3 Prokopidis, Prokopis (2021, July 18). ILSP Neural NLP Toolkit. [Software (Tool/Service)]. CLARIN:EL. http://hdl.handle.net/11500/CLARIN-EL-0000-0000-67D6-B

    More info
  • Journals of the Polish Academy of Sciences

    The PAS Journals page presents full contents of journals published or co-published by committees and institues of PAS and Beauro of Science Promotion at PAS. The journals and books are published under various Creative Commons licenses, thus, authors retain the copyright to their work

    More info
  • LINDAT Translation

    Neural Machine Translation for Czech, English, Ukrainian, French and other languages available as web interface, web service and a mobile application under name “Charles Translator”.

    More info
  • Language Resource Switchboard

    A web application that suggests language analysis tools for specific data sets. It provides access to tools for Sentence level analysis (Constituency Parsing - Dependency Parsing - Shallow Parsing), Word level analysis (Lemmatization - Morphological Analysis - Named Entity Recognition - Part-Of-Speech Tagging), Semantic analysis (Coreference Resolution - Sentiment Analysis - Text Summarization), Digital Humanities analysis (Distant Reading - Named Entity Linking - Stylometry - Topic modelling) and Speech Recognition. The Language Resource Switchboard ( https://switchboard.clarin.eu ) will automatically provide a list of available tools, based on the language and format of the input. The Switchboard can also be invoked from the Virtual Language Observatory ( https://vlo.clarin.eu ) and B2DROP (see Suggested compatible services below).

    More info
  • Language and Speech Tools

    Language and Speech Tools is a portal that provides access to various Text & Speech processing software services developed by the Centre of Language and Speech Technology or the Humanities Lab of Radboud University Nijmegen.

    More info
  • Marian NMT

    Marian is an efficient Neural Machine Translation framework written in pure C++ with minimal dependencies.

    Named in honour of Marian Rejewski, a Polish mathematician and cryptologist.

    Main features:

    • Efficient pure C++ implementation
    • Fast multi-GPU training and GPU/CPU translation
    • State-of-the-art NMT architectures: deep RNN and transformer
    • Permissive open source license (MIT)

    If you use this, please cite:

    Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch (2018). Marian: Fast Neural Machine Translation in C++ http://www.aclweb.org/anthology/P18-4020

    More info
  • Multilingual NLP for Archaeological Reports Ariadne Infrastructure

    Seven separate multilingual Named Entity Recognition (NER) pipelines for the text mining of English, Dutch and Swedish archaeological reports.

    The pipelines run on the GATE (gate.ac.uk) platform and match a range of entities of archaeological interest such as Physical Objects, Materials, Structure Elements, Dates, etc.

    More info
  • NameTag

    NameTag is an open-source tool for named entity recognition (NER). NameTag identifies proper names in text and classifies them into predefined categories, such as names of persons, locations, organizations, etc. NameTag 2 recognizes nested entities (embedded entities) of arbitrary depth.

    For Czech, it recognizes a complex hierarchy of categories . The English model, which is trained on CoNLL-2003 NER annotations (Sang and De Meulder 2003 ), distinguishes the following four NER classes: person, organisation, location and miscellaneous.

    The trained model for Czech is available for through LINDAT: Czech Models (CNEC) for NameTag . A user manual is also available.

    More info
  • NeMO NeDiMAH Methods Ontology

    The NeDiMAH Methods Ontology (NeMO) is a comprehensive ontological model of scholarly practice in the arts and humanities, the development of which is undertaken through the ESF Research Network NeDiMAH.

    NeMO is a CIDOC CRM - compliant ontology which explicitly addresses the interplay of factors of agency (actors and goals), process (activities and methods) and resources (information resources, tools, concepts) manifest in the scholarly process. It builds on the results of extensive empirical studies and modelling of scholarly practices performed by the Digital Curation Unit in projects DARIAH and EHRI.

    NeMO incorporates existing relevant taxonomies of scholarly methods and tools, such as TaDIRAH, the arts-humanities.net and Oxford taxonomies of ICT methods, DHCommons, CCC-IULA-UPF and DiRT, through appropriate mappings of the concepts defined therein onto a semantic backbone of NeMO concepts. It thus enables combining documentary elements on scholarly practices of different perspectives and using different vocabularies.

    The development of NeMO is an on-going project that greatly benefits from the discussions in related workshops and meetings and the generous intellectual contributions of participants. Interested individuals are invited to continue participating, and also to contribute comments through this site.

    More info
  • Nexus

    Nexus is a c++/javascript library for the creation and visualization of batched multiresolution 3D models and pointclouds.

    More info
  • OCR4all

    Optical Character Recognition (and more) for everyone

    OCR4all combines various open-source solutions to provide a fully automated workflow for automatic text recognition of historical printed (OCR) and handwritten (HTR) material. At pretty much any stage of the workflow the user can interact with the results in order to minimize consequential errors and optimize the end result. Due to its comprehensible and intuitive handling OCR4all explicitly addresses the needs of non-technical users.

    More info
  • OPERAS Metrics

    Metrics empowers Open Access publishers, university presses, and libraries by providing usage and impact metrics for published content. Designed with the nuanced needs of Social Sciences and Humanities in mind, this service consolidates diverse data sources into a single, transparent interface.

    OPERAS Metrics collects usage and impact metrics related to published Open Access books and allows their access, display and analysis from a single access point. Metrics is not only displayed for the publisher’s website, but also aggregated with those of other sites where a book is available.

    More info
  • OpenLIME

    OpenLIME (Open Layered IMage Explorer) is an open-source JavaScript library for the efficient display of scalable high-resolution relightable images.

    OpenLIME natively supports BRDF and RTI datasets, and can be easily extended for other multi-channel raster datasets, such as hyper spectral imaging or other reflectance modeling. Input data can be combined in a multi-layer visualization system using opacity and blending modes, and interactive lenses.

    More info
  • PERO OCR

    Pero-ocr package provides a full OCR pipeline including text paragraph detection, text line detection, text transcription, and text refinement using a language model. The package can be used as a command line application or as a python package which provides a document processing class and a class which represents document page content.

    More info
  • Recogito

    Recogito is an online platform for collaborative document annotation.

    Recogito provides a personal workspace where you can upload, collect and organize your source materials - texts and images - and collaborate in their annotation and interpretation. Recogito enables you to make your work more visible on the Web more easily, and to expose the results of your research as Open Data.

    More info
  • Research Spotlight

    Research Spotlight (RS) provides automated workflows that allow for the population of Scholarly Ontology’s core entities and relations. To do so, RS provides distance supervision techniques, allowing for training of machine learning models, interconnects with various APIs to harvest (linked) data and information from the web, and uses pretrained ML models along with lexico/semantic rules in order to extract information from text in research articles, associate it with information from article’s metadata and other digital repositories, and publish the infered knowledge as linked data. Simply put, Research Spotlight allows for the transformation of text from a research article into queriable knowledge graphs based on the semantics provided by the Scholarly Ontology.

    RS employs a modular architecture that allows for flexible expansion and upgrade of its various components. It is writen in Python and makes use of various libraries such as SpaCy for parsing and syntactic analysis of text, Beautiful Soup for parsing the html/xml structure of web pages and scikit-learn for implementing machine learning methodologies in order to extract entities and relations from text.

    More info
  • SSH Open Marketplace

    The Social Sciences and Humanities Open Marketplace, built as part of the Social Sciences and Humanities Open Cloud project (SSHOC), is a discovery portal which pools and contextualises resources for Social Sciences and Humanities research communities: tools, services, training materials, datasets, publications and workflows. The Marketplace highlights and showcases solutions and research practices for every step of the SSH research data life cycle.

    More info
  • SSH Vocabulary Commons

    The SSH Vocabulary Commons brings together experts and managers from SSH research infrastructures CESSDA, CLARIN, DARIAH and E-RIHS that agree to share their expertise and work towards common recommendations for firstly, using and managing vocabularies used in SSH and secondly operating, sharing and managing of vocabulary services useful for the broad SSH community.

    The SSH Vocabulary Commons work is specifically directed towards:

    • Common recommendations for creating, managing and using vocabularies used in SSH research and resource management that will make the SSH infrastructures better interoperable and more efficient.

    • Aligning current vocabulary management practices with EOSC and FAIR principles making vocabularies first-class citizens, easy to find, share and access.

    • Improving facilities for multilingual vocabularies.

    • Promoting sharing of vocabularies and management procedures and software between different SSH domains and organizations by recommending specific Knowledge Organization/Representation language formats e.g. SKOS and how to apply these e.g. vocabulary metadata, versioning

    • Providing easier cross domain integration of (meta)data and semantic interoperability by supporting and providing procedures for vocabulary matching

    • Maintaining contacts with vocabulary software development teams and promote SSH infrastructure interests

    • Facilitating vocabulary recommendation, supporting researchers finding relevant vocabulary for their research.

    • Providing a default vocabulary hosting and publishing service for orphaned vocabularies

    More info
  • TEITOK

    TEITOK is a web-based platform for viewing, searching, as well as collaborative creating and editing corpora in the TEI/XML format.

    Documents can contain both rich textual mark-up and linguistic annotation. Additionally it can visualise alignment with facsimile images, audio and video.

    For visitors, the system provides a graphical user interface in which the annotated document can be visualized in a number of different ways. And for administrators of the corpus, TEITOK uses the same interface to allow easy editing of the underlying XML document, meaning administrators can correct their corpus while they are consulting it.

    More info
  • Tesseract OCR

    Tesseract is a free raw OCR engine originally developed by HP Labs and maintained by Google. It works with the Leptonica Image Processing Library, and is capable of reading a variety of image formats. It can convert images to text in over 100 languages.

    More info
  • The ARIADNEplus Lab VRE

    The ARIADNE Lab provides archaeologists and scholars with a virtual research lab and a set of tools to aggregate the data of the ARIADNE infrastructure, make this data interoperable with personal external data, and to analyse and manipulate the data to answer specific research questions of archaeology or related disciplines.

    More info
  • Transformations: A DARIAH Journal

    Transformations: A DARIAH Journal is a multilingual journal created in 2024 by the European research infrastructure DARIAH ERIC.

    This journal is an ongoing publication with thematic issues in Digital Humanities, humanities, social sciences, and the arts. The journal is particularly interested in the use of digital tools, methods, and resources in a reproducible approach. It welcomes scientific contributions on collections of data, workflows and software analysis.

    More info
  • UDPipe: tool for lemmatization, morphological analysis, POS tagging and dependency parsing in multiple languages

    UDPipe is a software tool and service that analyzes (plain) text in about 100 different natural languages up to the dependency syntax level. Users specify the desired function (tokenization, segmentation, morphological analysis, lemmatization, POS tagging, dependency parsing), output format, and input text (file(s)). The resulting analysis can be used to index and search documents by lemmas instead of multiple word forms, extract syntactic dependencies with POS information to get relations between words or lemmas, or get grammatical information for every word in the text. While in many cases the results can be used directly (statistical analysis on part of speech, lemmas and words), in many other applications the results of UDPipe serve as intermediate input to more sophisticated analysis, such as information extraction, knowledge representation, term extraction etc. UDPipe software allow to train on any language for which a treebank is available in the CoNLL-U format, such as all the Universal Dependencies corpora.

    More info
  • Virtual Language Observatory

    A facet browser for fast navigation and searching in large amounts of metadata. This portal enables the discovery of language data and tools, provided by over 40 CLARIN centres, other language resource providers and Europeana. The VLO ( https://vlo.clarin.eu ) also provides access to the Virtual Collection Registry ( https://www.clarin.eu/content/virtual-collections ) metadata and can be used as a starting point to process language data with the Language Resource Switchboard ( https://switchboard.clarin.eu )

    More info
  • Virtual Transcription Laboratory

    Webservice for creating text transcriptions of digitized material. Enables importing data from various sources, can perform automatic transcription (OCR) that then can be revied by the users. Results are available in various formats and can be imported back to the digital library. VTL also has a crowdsourcing element to it. It gives you possibilities to work on creating a transcription as a group and even so called “transcribatons” can be organized.

    More info
  • Vocabs service

    The Vocabs services are a suite of services that accompany the user through the whole process of vocabulary creation, publication, and maintenance. They comprise Vocabs browse, which allows to explore and consult the controlled vocabularies published on the platform, and Vocabs editor, which allows to collaboratively create and edit controlled vocabularies based on the SKOS data model. The suite also includes an API for programmatic access to the vocabularies.

    More info
  • X3ML Engine

    The X3ML Engine realizes the transformation of the source records to the target format. The engine takes as input the source data (currently in the form of an XML document), the description of the mappings in the X3ML mapping definition file and the URI generation policy file and is responsible for transforming the source document into a valid RDF document which corresponds to the input XML file, with respect to the given mappings and policy.

    X3ML Engine can be used either programmatically or directly from console.

    More info
  • eScriptorium

    eScriptorium means to provide researchers in the humanities with an integrated set of tools to transcribe, annotate, translate and publish historical documents. The eScriptorium app itself is at the ‘center’. It is a work in progress but will implement at least automatic transcriptions through kraken, indexation for complex search and filtering, annotation and some simple forms of collaborative working such as sharing and versioning.

    More info