synthetic dataset #2

pj771 · 2020-06-14T02:44:33Z

Hi, I have dataset related questions, specifically regarding generation of synthetic dataset,
What were the parameters selected to create the synthetic dataset from corresponding repository (i.e., kikones34 /handwritten-document-synthesizer),
I am using following command,

./synthesize -num-pages=270 -words -distort-bboxes

which creates about 60k synthetic handwritten word images (in paper it is mentioned 5.6 million word images are created)

Also was the default corpus provided in kikones34 /handwritten-document-synthesizer used to create synthetic word images or was it changed from its default setting?

leitro · 2020-06-14T18:00:34Z

Hi! You are right, more synthetic words can be obtained by changing ''-num-pages'' to a bigger number.

About the corpus, I downloaded ebooks from here.

Cheers:-)

pj771 · 2020-06-14T20:20:44Z

Thanks. Just to confirm, how many synthetic word images were created to generate synthetic word dataset? was it 60k or 5.6 million?

Edit: Also, did you download all books? or any specific list?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synthetic dataset #2

synthetic dataset #2

pj771 commented Jun 14, 2020 •

edited

Loading

leitro commented Jun 14, 2020

pj771 commented Jun 14, 2020 •

edited

Loading

synthetic dataset #2

synthetic dataset #2

Comments

pj771 commented Jun 14, 2020 • edited Loading

leitro commented Jun 14, 2020

pj771 commented Jun 14, 2020 • edited Loading

pj771 commented Jun 14, 2020 •

edited

Loading

pj771 commented Jun 14, 2020 •

edited

Loading