What should the vocab file be set to when running on new documents and questions #43

murphp15 · 2018-06-01T13:39:26Z

I want to boot up a demo of this using the server.py file in this repo.
However I'm not sure what I should set the vocab file to be.
in the run_on_user_documents.py file I see you set it to be all the words in the questions the client asks. However if I don't know the words from the questions and documents up front how should I handle this?

chrisc36 · 2018-06-04T20:16:28Z

What I did for the demo is to pre-compute a file of the top-n words the occur in the TriviaQA corpus and use that.

You can set it to None, in which it will just use all the words found in the word vector file, but I think the code needs to be tweaked a little to do that since there are an absurd number of words in the Glove word vectors and there is limit to how large constant tensors can be in the graph.

murphp15 · 2018-06-05T09:05:52Z

What script did you run to pre-compute this vocab file?

chrisc36 · 2018-06-10T20:35:47Z

I don't have a script I can share at the moment, but its a pretty simple task to iterate through the pre-tokenized files and count the number of words.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should the vocab file be set to when running on new documents and questions #43

What should the vocab file be set to when running on new documents and questions #43

murphp15 commented Jun 1, 2018

chrisc36 commented Jun 4, 2018

murphp15 commented Jun 5, 2018 •

edited

Loading

chrisc36 commented Jun 10, 2018

What should the vocab file be set to when running on new documents and questions #43

What should the vocab file be set to when running on new documents and questions #43

Comments

murphp15 commented Jun 1, 2018

chrisc36 commented Jun 4, 2018

murphp15 commented Jun 5, 2018 • edited Loading

chrisc36 commented Jun 10, 2018

murphp15 commented Jun 5, 2018 •

edited

Loading