Skip to content

text8

Compare
Choose a tag to compare
@chaitaliSaini chaitaliSaini released this 14 Oct 12:04
· 74 commits to master since this release

First 100,000,000 bytes of plain text from Wikipedia. Used for testing purposes, see wiki-english-* for proper full Wikipedia datasets.

attribute value
File size 32MB
Number of rows 1701

Read more:

Example

import gensim.downloader as api
from gensim.models.word2vec import Word2Vec

data = api.load("text8")
model = Word2Vec(data)
model.most_similar("human", topn=3)

"""
Output:

[('humans', 0.6429149508476257), ('animal', 0.6419760584831238), ('biological', 0.6034130454063416)]
"""