text8
First 100,000,000 bytes of plain text from Wikipedia. Used for testing purposes, see wiki-english-*
for proper full Wikipedia datasets.
attribute | value |
---|---|
File size | 32MB |
Number of rows | 1701 |
Read more:
Example
import gensim.downloader as api
from gensim.models.word2vec import Word2Vec
data = api.load("text8")
model = Word2Vec(data)
model.most_similar("human", topn=3)
"""
Output:
[('humans', 0.6429149508476257), ('animal', 0.6419760584831238), ('biological', 0.6034130454063416)]
"""