OCR-D

Project objectives

The OCR-D project has been gathering and developing Open Source software for automatic text recognition (OCR) of historical printed documents, with the mid-term goal of creating high-quality full texts for all the works catalogued in the VD (Verzeichnis der deutschsprachigen Drucke, Catalogue of German-language prints). Phase I (2016-2018) of OCR-D was concerned with evaluating the state of the art in OCR and developing a functional model for OCR workflows. In phase II (2018-2020) eight module projects developed solutions for missing or underdeveloped components of the OCR workflow. In the current Phase III (2021-2024), besides the cordination project there are three module projects continuing to work on OCR workflow components and four implementation projects that devise actionable deployment scenarios for libraries and archives.

Project term

2016 - 2024

Project participants

Berlin-Brandenburgische Akademie der Wissenschaften
Herzog-August-Bibliothek Wolfenbüttel
Staats- und Universitätsbibliothek Göttingen
Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen mbh

Third-party-funding

DFG

Contact

Reinhard Altenhöner
phone: +49 30 266 431 400
reinhard.altenhoener@sbb.spk-berlin.de

Read more about the project

OCR-D