datasets

https://research.fb.com/downloads/babi/ - Text uderstanding questions, Goal oriented Dialogs, Context questions, Movie questions, Simple Questions
http://datasets.maluuba.com - Q&A pairs for CNN news (120k), Frames - Goal oriented Dialogs
https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=revue# - Question duplicates (400k)
https://www.gutenberg.org/ - books
https://cloud.google.com/blog/big-data/2016/12/google-bigquery-public-datasets-now-include-stack-overflow-q-a - Stackoverflow data
http://commoncrawl.org/ - Internet archive, http://webdatacommons.org/
https://www.yelp.com/dataset_challenge - Yelp review dataset
https://github.com/yury-chernushenko/word2vec-api - Word2Vec
http://www.msmarco.org/ - Q/A on 5W1H
https://rajpurkar.github.io/SQuAD-explorer/ - Q/A
https://medium.com/startup-grind/fueling-the-ai-gold-rush-7ae438505bc2#.wdqtofqpx - Different datasets

Question Answering

(MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2016 [paper] [data]
(NewsQA) NewsQA: A Machine Comprehension Dataset, 2016 [paper] [data]
(SQuAD) SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 [paper] [data]
(GraphQuestions) On Generating Characteristic-rich Question Sets for QA Evaluation, 2016 [paper] [data]
(Story Cloze) A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories, 2016 [paper] [data]
(Children's Book Test) The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations, 2015 [paper] [data]
(SimpleQuestions) Large-scale Simple Question Answering with Memory Networks, 2015 [paper] [data]
(WikiQA) WikiQA: A Challenge Dataset for Open-Domain Question Answering, 2015 [paper] [data]
(CNN-DailyMail) Teaching Machines to Read and Comprehend, 2015 [paper] [code to generate] [data]
(QuizBowl) A Neural Network for Factoid Question Answering over Paragraphs, 2014 [paper] [data]
(MCTest) MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text, 2013 [paper] [data]
(QASent) What is the Jeopardy model? A quasisynchronous grammar for QA, 2007 [paper] [data]

Dialogue Systems

(Ubuntu Dialogue Corpus) The Ubuntu Dialogue Corpus : A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems, 2015 [paper] [data]

Goal-Oriented Dialogue Systems

(Frames) Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems, 2016 [paper] [data]
(DSTC 2 & 3) Dialog State Tracking Challenge 2 & 3, 2013 [paper] [data]

#papers

http://www2016.net/proceedings/proceedings/p1373.pdf - Mining User Intentions from Medical Queries

#links

#conferences

ACL
EMNLP
ICLR
NIPS

Computer vision

CVPR

libraries

https://github.com/facebookresearch/fastText - embeddings

implementations

http://alias-i.com/lingpipe/demos/tutorial/cluster/read-me.html

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datasets

Question Answering

Dialogue Systems

Goal-Oriented Dialogue Systems

Computer vision

libraries

implementations

About

Releases

Packages

ychernushenko/datasets

Folders and files

Latest commit

History

Repository files navigation

datasets

Question Answering

Dialogue Systems

Goal-Oriented Dialogue Systems

Computer vision

libraries

implementations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages