Skip to content


Location: Sheffield, United Kingdom
Contact: Jo Wright (

The Natural Language Processing (NLP) group at USFD is one of the largest and most successful research groups in text mining and natural language processing in the EU. USFD develops and maintains the world leading open-source GATE text mining and natural language processing infrastructure (see link here) and its vibrant user community. The GATE NLP services are already serving users from CLARIN (through CLARIN UK), the SoBigData++ research infrastructure, and the European Language Grid (ELG). The GATE infrastructure has hundreds of users, with the community growing annually through summer schools, online training materials, and take up via CLARIN, SoBigData++, and ELG. We have experience with hosting TNA visits, since USFD manages TNA access for the SoBigData++ infrastructure, as well as hosts SoBigData++ TNA visits annually.

Services currently offered by the infrastructure:

The GATE/CLARIN UK infrastructure already provides not only general purpose NLP services such as named entity recognition and part-of-speech tagging, but also services specifically tailored to the domain of archeology. These include English, Dutch, and Swedish Archeology and Dendrochronology Named Entity Recognisers. The archeology-specific services being offered will grow during the project itself (see WP4). In the past 5 years alone, GATE has underpinned over 15 European projects. We also offer access to state-of-the art HPC and Hadoop computer clusters for running text mining experiments, as well as access to numerous linguistic resources and datasets that can be used to train and adapt language models and carry out machine learning experiments.


USFD will offer access to tools, models and resources such as datasets, in the area of use and adaptation of language technologies. They will provide mentors to guide the application and adaptation of tools and resources to archaeological data in research visitors’ use cases.

The GATE/CLARIN UK research infrastructure represented by USFD offers to host up to 12 person weeks a year in the area of use and adaptation of language technologies, such as language analysis (including named entity recognition and terminology extraction), machine translation, and multilingual adaptation of NLP models and tools. Partners from USFD will collaborate to guide the application of the above-mentioned methods to archaeological data in research visitors’ use cases and will be able to provide multilingual datasets for NLP. Mentors will be available to the hosted researchers to explain the issues and guide the visitors while working on their own research problems in these areas. Researchers hosted by the RI will get workspace, access to its computing resources and all data and software in the RI’s repository, as well as the web services that the RI offers.