Language-independent access to news - AI-assisted search in newspapers

Project object

The SBB holds a large digital corpus of newspapers in various languages, particularly from the late 19th to the early 21st century. She is currently developing a concept for a newspaper competence centre that will be responsible for these materials across all holding departments within the SBB. Newspapers are important sources not only for historical and contemporary historical research. They are also in high demand as special sources for social and natural sciences research. However, searching digitally available historical or current newspapers is either only possible on commercial platforms of one or more publishers (platform dependency) or each newspaper has to be searched individually, whereas in this project, newspapers are also offered for search across publishers and providers, regardless of their publishing platform, in accordance with the licensing conditions.

The project aims to improve access to newspapers, and potentially other full-text content from the SBB, by implementing AI-based search functions. This is based on AI models that convert text into mathematical vectors, for which extensive preparatory work has already been carried out at the SBB. The project also aims at providing users with several innovative access options. The following steps are planned:

1. Semantic search: AI models will be implemented to find content based solely on its semantic relevance, regardless of the exact wording of the search terms.

2. Multilingual search: AI models will be implemented to find content regardless of language.

3. License-independent search: TDM-compliant summaries will be generated for all content, enabling users to view content from copyright-protected materials.

The project builds on data from SBB newspaper archives (incl. licensed newspapers) dating back to the late 19th century. The new search function offers an international and language-independent view of world events, opening up new perspectives and bringing them together at the same time. The resulting prototype will be immediately available to users as a new service on an SBB website.

Project term

Dec 2024  - July 2026 

Third-party-funding

Der Beauftragte der Bundesregierung für Kultur und Medien (BKM)

Contact

Matthias Kaun
East Asia
Head of department
phone: +49 30 266 430 000
matthias. kaun@sbb.spk-berlin.de