Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create data for missing textbooks #94

Open
lmullen opened this issue Aug 29, 2024 · 3 comments
Open

Create data for missing textbooks #94

lmullen opened this issue Aug 29, 2024 · 3 comments
Assignees

Comments

@lmullen
Copy link
Owner

lmullen commented Aug 29, 2024

No description provided.

@lmullen
Copy link
Owner Author

lmullen commented Aug 29, 2024

Data should be in two tables.

BOOK METADATA:
bibliographicid (ignore)
year
title
vols
subjects (can be an array)
psmid (make something up)
author (can also add) (e.g., Abbott, Benjamin Vaughan; Barringer, Victor Clay)

PAGE OCR DATA, one row per page
pageid (format: page 1 = 00010
psmid (make it up, this identifies the volume)
ocrtext

@lmullen
Copy link
Owner Author

lmullen commented Aug 29, 2024

For psmids use something like this:

momlextra00001

@kfunk074
Copy link
Collaborator

kfunk074 commented Sep 5, 2024

Metadata here:
side_corpus.csv

Page OCR data is stored here: https://drive.google.com/drive/folders/1vzENEoxKK74cAI_m5qpVLLBFl9moKW3P?usp=sharing

Each folder is labeled with its psmid, and each page is a separate text file in the folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants