Skip to content

Project to scrape flash reports from schools in Nepal to build a rich dataset of all the schools in Nepal

Notifications You must be signed in to change notification settings

rrijal53/Scrape-Flash-Reports

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scraping Flash School Reports

This project is about scraping the Flash School Reports in Nepal to create an easily machine-readable dataset about the schools in Nepal.

It is also partially a learning project to learn the scrapy scraping platform.

Getting started

Get the data! After cloning this repository and cd-ing into the correct directory,

cd scrape_flash_reports
mkdir data

To get the districts / vdc lists, you can run:

scrapy crawl districts -o data/districts.csv
scrapy crawl vdcs -o data/vdcs.csv

To test that each school is being parsed properly, try:

scrapy parse 'http://202.70.77.75:8080/flash/schoolreport/reportshow.php?d=69&v=002&s=690020002&t=&yr=2070' --spider=schools --callback=parse_school_page

To scrape all the schools (untested), you can run:

scrapy crawl schools -o data/schools.csv -s JOBDIR=crawls/schools-1

This crawl can be stopped using CTRL-C and re-started, and should re-start from where it was. This is untested, but the documentation is from here

Contributors

Contributors to this project:

  • Bikram Adhikari (@meadhikari)
  • Sakar Pudasaini (@karkhana)
  • Prabhas Pokharel (@prabhasp)
  • Roshan Rijal (@rrijal53)

About

Project to scrape flash reports from schools in Nepal to build a rich dataset of all the schools in Nepal

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%