#SUNYdigs

Overview

Paleontologists throughout the 20th century used field notebooks to keep detailed logs of their expeditions. Previous work at the museum has given us these notes as scanned images and as very imperfect text transcriptions. The text has never analyzed for potentially relevant pieces of information that could lead to new understanding of past expeditions. "These data are very frequently requested by researchers from around the world, but their imperfect nature make them less useful than they could be."

The SUNYdigs team decided that the best solution for this project was to gamify this "Dig Up the Past" challenge by creating a platform where users would transcribe without being overwhelmed.

</h5>

Inspiration

The project was in part inspired by the hugely successful crowdsourcing project - [reCAPTCHA](https://www.cylab.cmu.edu/partners/success-stories/recaptcha.html)
Here's a [link](https://www.youtube.com/watch?v=-Ht4qiDRZE8) to the TED talk about Massive-scale online collaboration by Luis Von Ahn,
founder of reCAPTCHA and CEO at Duolingo.

Design

As a first step, the system would perform text-based image segmentation on all scanned pages.
For every scanned page in the [journal](https://github.com/amnh/HacktheDinos/tree/master/challenges/Dig-Up-The-Past) a corresponding folder is created with the same title that contains all the words segmented. A text file is also created alongside the original image of the scanned page, with the same title that contains each word's metadata in form of a JSON object.

`{"img_page": path/to/parent_image, "year": 1899, "word": [path/to/word_image_file, "!@#$%", number_of_votes], "author": "brown"}`

This metadata remains in queue till five upvotes have been gathered for a particular transcription. After this, it is saved to a database and archived.

Improvements and Observations

* 'Suggest' feature to aid guessing of difficult Paleontology terms for a layman transcribing documents.
* Higher resolution images would help in getting words out of images with better accuracy. * The text-based image segmentation is currently optimized for sparsely populated and well-formatted journal pages.

Issues

Git issues have been created for problems needing immediate attention. Feel free to contribute. :)

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
ImageProc		ImageProc
SUNYdigsUpthePast		SUNYdigsUpthePast
UX		UX
demo		demo
.DS_Store		.DS_Store
.gitignore		.gitignore
Presentation.pdf		Presentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

#SUNYdigs

Overview

Inspiration

Design

Improvements and Observations

Issues

About

Releases

Packages

Contributors 2

Languages

HackTheDinos/SUNYdigs

Folders and files

Latest commit

History

Repository files navigation

#SUNYdigs

Overview

Inspiration

Design

Improvements and Observations

Issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages