Releases · piskvorky/gensim-data

attribute	value
File size	66MB
Number of vectors	400000
Dimension	50
License	http://opendatacommons.org/licenses/pddl/

attribute

value

File size

66MB

Number of vectors

400000

Dimension

License

http://opendatacommons.org/licenses/pddl/

Fake news dataset contains text and metadata from 244 websites and represents posts in total from a specific window of 30 days. The data was pulled using the webhose.io API, and because it's coming from their crawler, not all websites identified by their BS Detector are present in this dataset. Data sources that were missing a label were simply assigned a label of 'bs'. There are (ostensibly) no genuine, reliable, or trustworthy news sources represented in this dataset (so far), so don't trust anything you read.

attribute	value
File size	19MB
Number of posts	12999
Licence	https://creativecommons.org/publicdomain/zero/1.0/

https://www.kaggle.com/mrisdal/fake-news

Example

import gensim.downloader as api
import json

fake_news = api.load("fake-news")
for doc in fake_news: 
    print(json.dumps(doc, indent=4))
    break

"""
Output:

{
    "comments": "0",
    "title": "Muslims BUSTED: They Stole Millions In Gov\u2019t Benefits",
    "published": "2016-10-26T21:41:00.000+03:00",
    "site_url": "100percentfedup.com",
    "language": "english",
    "text": "Print They should pay all the back all the money plus interest. The entire family and everyone who came in with them need to be deported asap. Why did it take two years to bust them? \nHere we go again \u2026another group stealing from the government and taxpayers! A group of Somalis stole over four million in government benefits over just 10 months! \nWe\u2019ve reported on numerous cases like this one where the Muslim refugees/immigrants commit fraud by scamming our system\u2026It\u2019s way out of control! More Related",
    "domain_rank": "25689",
    "crawled": "2016-10-27T01:49:27.168+03:00",
    "type": "bias",
    "likes": "0",
    "shares": "0",
    "spam_score": "0",
    "country": "US",
    "author": "Barracuda Brigade",
    "participants_count": "1",
    "ord_in_thread": "0",
    "thread_title": "Muslims BUSTED: They Stole Millions In Gov\u2019t Benefits",
    "uuid": "6a175f46bcd24d39b3e962ad0f29936721db70db",
    "main_img_url": "http://bb4sp.com/wp-content/uploads/2016/10/Fullscreen-capture-10262016-83501-AM.bmp.jpg",
    "replies_count": "0"
}
"""

attribute	value
File size	32MB
Number of rows	1701

attribute

value

File size

32MB

Number of rows

1701

import gensim.downloader as api from gensim.models.word2vec import Word2Vec data = api.load("text8") model = Word2Vec(data) model.most_similar("human", topn=3) """ Output: [('humans', 0.6429149508476257), ('animal', 0.6419760584831238), ('biological', 0.6034130454063416)] """

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: piskvorky/gensim-data

glove-wiki-gigaword-50

fake-news

text8