This project investigates and implements different methods for detecting catalogue entries within printed catalogues. While printed catalogues are easy enough to digitise and convert into machine readable data, dividing that data by catalogue entry requires converting visual signifiers of divisions between entries - gaps in the printed page, large or upper-case headers, catalogue references - into machine-readable information.
The data used is XML-formatted data derived from the 13-volume Catalogue of books printed in the 15th century now at the British Museum. The project was undertaken in support of Rossitza Atanassova's AHRC-RLUK Professional Practice Fellowship.
This project is the British Library maintained version of code produced in 2022/2023 by Isaac Dunford as part of a Digital Humanities Internship funded by the School of Humanities at the University of Southampton. Isaac's original code is at https://github.com/Southampton-Digital-Humanities/2023_Catalogue-Entry-Detection.
Isaac describes the work in his post of the British Library Digital Scholarship blog.
All data provided by the British Library: text data CC0 1.0 Universal Public Domain; images CC-BY 4.0 International. For code use MIT License.