Skip to content

Latest commit

 

History

History
21 lines (17 loc) · 1.05 KB

readme.md

File metadata and controls

21 lines (17 loc) · 1.05 KB

OpenStreetMap Data Wrangling Project

Zanete Ence

This directory contains a set of scripts that audit and clean open street map XML data and turn it into a json format for MongoDB It contains the following files:

  • map.txt: contains the URL to the full download of OpenStreetMap XML for the area of Crawley, West Sussex, United Kingdom
  • crawley.osm: a random sample of the original dataset
  • Python scripts:
    • audit.py: Initial pass at the data set to confirm schema assumptions and produce an overview report
    • audit_tags.py: Secondary pass at the data with focus on contents of the tag elements, produces a csv with all key value pairs encountered
    • clean.py: Script that cleans and shapes the original XML data and transforms into json file containing list of map entities If CODE RUN:
  • tag_audit_report.csv: an export of all the key value pairs encountered and their count (produced by audit_tags.py)
  • clean_data.json: an export of map entities in json format (produced by clean.py)

Libraries used:

  • xml
  • pprint
  • pandas