- https://research.fb.com/downloads/babi/ - Text uderstanding questions, Goal oriented Dialogs, Context questions, Movie questions, Simple Questions
- http://datasets.maluuba.com - Q&A pairs for CNN news (120k), Frames - Goal oriented Dialogs
- https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=revue# - Question duplicates (400k)
- https://www.gutenberg.org/ - books
- https://cloud.google.com/blog/big-data/2016/12/google-bigquery-public-datasets-now-include-stack-overflow-q-a - Stackoverflow data
- http://commoncrawl.org/ - Internet archive, http://webdatacommons.org/
- https://www.yelp.com/dataset_challenge - Yelp review dataset
- https://github.com/yury-chernushenko/word2vec-api - Word2Vec
- http://www.msmarco.org/ - Q/A on 5W1H
- https://rajpurkar.github.io/SQuAD-explorer/ - Q/A
- https://medium.com/startup-grind/fueling-the-ai-gold-rush-7ae438505bc2#.wdqtofqpx - Different datasets
- (MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2016 [paper] [data]
- (NewsQA) NewsQA: A Machine Comprehension Dataset, 2016 [paper] [data]
- (SQuAD) SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 [paper] [data]
- (GraphQuestions) On Generating Characteristic-rich Question Sets for QA Evaluation, 2016 [paper] [data]
- (Story Cloze) A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories, 2016 [paper] [data]
- (Children's Book Test) The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations, 2015 [paper] [data]
- (SimpleQuestions) Large-scale Simple Question Answering with Memory Networks, 2015 [paper] [data]
- (WikiQA) WikiQA: A Challenge Dataset for Open-Domain Question Answering, 2015 [paper] [data]
- (CNN-DailyMail) Teaching Machines to Read and Comprehend, 2015 [paper] [code to generate] [data]
- (QuizBowl) A Neural Network for Factoid Question Answering over Paragraphs, 2014 [paper] [data]
- (MCTest) MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text, 2013 [paper] [data]
- (QASent) What is the Jeopardy model? A quasisynchronous grammar for QA, 2007 [paper] [data]
- (Ubuntu Dialogue Corpus) The Ubuntu Dialogue Corpus : A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems, 2015 [paper] [data]
- (Frames) Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems, 2016 [paper] [data]
- (DSTC 2 & 3) Dialog State Tracking Challenge 2 & 3, 2013 [paper] [data]
#papers
- http://www2016.net/proceedings/proceedings/p1373.pdf - Mining User Intentions from Medical Queries
#links
- https://www.heise.de/tr/artikel/KI-mit-emotionaler-Intelligenz-3613884.html
- http://citeomatic.semanticscholar.org/
#conferences
- ACL
- EMNLP
- ICLR
- NIPS
- CVPR
- https://github.com/facebookresearch/fastText - embeddings