Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synthetic dataset #2

Open
pj771 opened this issue Jun 14, 2020 · 2 comments
Open

synthetic dataset #2

pj771 opened this issue Jun 14, 2020 · 2 comments

Comments

@pj771
Copy link

pj771 commented Jun 14, 2020

Hi, I have dataset related questions, specifically regarding generation of synthetic dataset,
What were the parameters selected to create the synthetic dataset from corresponding repository (i.e., kikones34 /handwritten-document-synthesizer),
I am using following command,

./synthesize -num-pages=270 -words -distort-bboxes

which creates about 60k synthetic handwritten word images (in paper it is mentioned 5.6 million word images are created)

Also was the default corpus provided in kikones34 /handwritten-document-synthesizer used to create synthetic word images or was it changed from its default setting?

@leitro
Copy link
Contributor

leitro commented Jun 14, 2020

Hi! You are right, more synthetic words can be obtained by changing ''-num-pages'' to a bigger number.

About the corpus, I downloaded ebooks from here.

Cheers:-)

@pj771
Copy link
Author

pj771 commented Jun 14, 2020

Thanks. Just to confirm, how many synthetic word images were created to generate synthetic word dataset? was it 60k or 5.6 million?

Edit: Also, did you download all books? or any specific list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants