Skip to content

word2vec-ruscorpora-300

Compare
Choose a tag to compare
@menshikh-iv menshikh-iv released this 18 Dec 08:56
· 12 commits to master since this release
9b43cbd

Word2vec Continuous Skipgram vectors trained on the full Russian National Corpus (about 250M words).

Related issue #3.

attribute value
File size 199MB
Number of vectors 184973
Preprocessing The corpus (used for training) was lemmatized and tagged with Universal PoS
Window size 10
Dimension 300
License https://creativecommons.org/licenses/by/4.0/deed.en

Read more:

Example

import gensim.downloader as api

model = api.load("word2vec-ruscorpora-300")
for word, distance in model.most_similar(u"кот_NOUN"):  
    print(u"{}: {:.3f}".format(word, distance))
  
"""
output:

кошка_NOUN: 0.757
котенок_NOUN: 0.668
пес_NOUN: 0.563
мяукать_VERB: 0.562
тобик_NOUN: 0.559
фоксик_NOUN: 0.557
собака_NOUN: 0.557
мяучать_VERB: 0.554
харлашка_NOUN: 0.552
котяра_NOUN: 0.551
"""