Catalogue Entry Detection

This project investigates and implements different methods for detecting catalogue entries within printed catalogues. While printed catalogues are easy enough to digitise and convert into machine readable data, dividing that data by catalogue entry requires converting visual signifiers of divisions between entries - gaps in the printed page, large or upper-case headers, catalogue references - into machine-readable information.

The data used is XML-formatted data derived from the 13-volume Catalogue of books printed in the 15th century now at the British Museum. The project was undertaken in support of Rossitza Atanassova's AHRC-RLUK Professional Practice Fellowship.

This project is the British Library maintained version of code produced in 2022/2023 by Isaac Dunford as part of a Digital Humanities Internship funded by the School of Humanities at the University of Southampton. Isaac's original code is at https://github.com/Southampton-Digital-Humanities/2023_Catalogue-Entry-Detection.

Isaac describes the work in his post of the British Library Digital Scholarship blog.

License

All data provided by the British Library: text data CC0 1.0 Universal Public Domain; images CC-BY 4.0 International. For code use MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Catalogue Entry Detection

License

About

Releases 1

Packages

Languages

britishlibrary/Incunabula-Catalogue-Entry-Detection

Folders and files

Latest commit

History

Repository files navigation

Catalogue Entry Detection

License

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages