Skip to content

Various projects in order to improve my scraping skills :)

Notifications You must be signed in to change notification settings

mansi-codder/Web_Scraping_Projects

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  • 2019-10-07 - Car Datasheet (1970-1982) - Basics of scraping using bs4, first try with a notebook, then using a robust script written with pycharm, and finally an exploratory data analysis with pandas and seaborn

  • 2019-10-20 - Pokemon's database - a little more complicated scraping task using bs4, different first tries are made with the notebooks in the directory but the main and final script with pycharm is HERE. Then i've started to make a statisctical analysis but it's uncomplete, i'll finished later if i've time :)

  • 2019-11-07 - Historical climate & meteo data - advanced scraping using different ways and technics to retrieve data with bs4, first tries with a jupyter notebook, the final python script can be found here it makes use of FakeUserAgent to forge requests with random realistic browser's headers. I've also use sleep intervals between requests of a random nb of seconds. Finally, the script is quite robust, all cases are managed, missing values, http errors...


More to come in the next weeks:

  • retrieve news' titles of tabloid or online newspapers
  • scrape tweets message
  • get top 250 movies on IMDB
  • download every link on a specific webpage, display if links are dead
  • image site downloader (for image search engines)
  • etc...
  • use of scrapy / selenium

About

Various projects in order to improve my scraping skills :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 72.2%
  • HTML 27.2%
  • Python 0.6%