Research project B2/233/P2/BelgicaWeb (Research action B2)
Within Pillar 2 "heritage science", the Royal Library of Belgium (KBR) proposes BelgicaWeb: a project to open up Belgium’s born-digital heritage - information that only exists online - and make it FAIR. The project aims to develop a multilingual user-friendly access platform and an API enabling data-level access. BelgicaWeb builds on the knowledge from KBR’s other Belgica initiatives (see: BelgicaPress and BelgicaPeriodicals) and will contribute to opening up KBR’s growing born-digital collections for research and analysis.
To ensure the scientific exploitation and social valorisation of this born-digital heritage, BelgicaWeb will: 1) investigate how to sustainably provide access to these collections; 2) develop the necessary data infrastructure; 3) enrich the (meta)data via linked data, Natural Language Processing or other digital methods; 4) analyse the relevant legal frameworks and 5) promote Belgium’s born-digital heritage.
BelgicaWeb is innovative in the sense that it will develop an integrated access platform that is optimised for both archived websites and social media whereas tools for replaying the content are mostly developed with websites in mind. Offering data-level access to born-digital collections via API is also an approach that is not often taken by national libraries. The project would therefore have a major impact on scientific knowledge and also on heritage and collection management since new born-digital collections will be created and preserved during the project. Moreover, it will enable KBR to build further in-house expertise in offering born-digital collections as data via an API and developing and maintaining an access portal. Providing access to archived born-digital heritage has a strong impact on civil society as it supports the citizens’ right to information and offers insight into the online behaviour of citizens at large.
The BelgicaWeb project brings together partners with a range of expertise. CRIDS at UNamur will provide expertise on the relevant legal issues, IDLab, GhentCDH and MICT at UGent will work on data enrichment, user engagement and evaluation and outreach to the research community respectively. KBR will serve as project coordinator and will work on the development of the access platform and API and data enrichment. This project already has experience working together with web and social media archiving in the PROMISE and BESOCIAL projects, which is an additional strength for this project.
Within the project a reference group of experts will iteratively provide input on the selection, the development of the API and access platform, data enrichment and quality control and usability. Collections of born-digital data will be created using tools for archiving web and social media content and this data will be indexed. The API and access platform will be developed in two phases: a pilot phase with an evaluation and a final phase. The data will also be enriched by making use of digital methods such as linked data or Natural Language Processing. The legal component of the project comprises an analysis of the legal frameworks regarding data exchange, text and data mining, data protection and privacy rights, freedom of expression and impact of the proposed European regulation on artificial intelligence.