Skip to content

ywpkwon/googling4data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

googling4data

Autonomous web text crawling (googling) for big data (natural language processing)

For what

For a given string (e.g., "apple"), these codes (1) google the string, (2) retrieve html pages, (3) extract visible texts from the pages, and then, (4) compress all the texts to a zip file.

How to use

prerequisite

To run, you'll need key.json which this repository does not include. The format should be as below, and the values should be yours. They are required by Google.

{
    "api_key": "your-google-api-key",
    "cse_id": "your-cse-id"
}

I referred http://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search.

pip install google-api-python-client
pip install html2text

If there can be more simple or easier way to do this, please lighten me up.

example

** This is still during construction.

If this helps, please add a star for me ;)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages