Skip to content

An ETL in a Pure Python single script, service oriented, to make a request and download, unzip, extract, parse csv, normalize dataset and retrieve data as json

Notifications You must be signed in to change notification settings

rodrigoddc/ETL-pure-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

An ETL in a single pure python script

Motivated by a job chalenge.

This is a Pure Python single script, service oriented, to make a request to https://www.jodidata.org/_resources/files/downloads/gas-data/jodi_gas_csv_beta.zip download, unzip, extract, parse csv, normalize dataset and retrieve data

Flow:

  1. Download resource from the URL https://www.jodidata.org/_resources/files/downloads/gas-data/jodi_gas_csv_beta.zip as input;
  2. Unzip and parse csv to list of csv.DictReader;
  3. Normalize data as time series, sorting and aggregating by zone as Key, time period formated to ISO 8061, and the others fields;
  4. Parse as json;
  5. Write to stdout as JSON;

Results:

The data will be transformed from this:

[{'REF_AREA': 'ZA', 'TIME_PERIOD': '2018-08', 'ENERGY_PRODUCT': 'NATGAS', 'FLOW_BREAKDOWN': 'TOTIMPSB', 'UNIT_MEASURE': 'M3', 'OBS_VALUE': '355', 'ASSESSMENT_CODE': '1'},
{'REF_AREA': 'ZA', 'TIME_PERIOD': '2018-08', 'ENERGY_PRODUCT': 'NATGAS', 'FLOW_BREAKDOWN': 'TOTIMPSB', 'UNIT_MEASURE': 'TJ', 'OBS_VALUE': '13878', 'ASSESSMENT_CODE': '1'},
{'REF_AREA': 'UY', 'TIME_PERIOD': '2017-10', 'ENERGY_PRODUCT': 'NATGAS', 'FLOW_BREAKDOWN': 'TOTDEMO', 'UNIT_MEASURE': 'M3', 'OBS_VALUE': '5', 'ASSESSMENT_CODE': '1'},
{'REF_AREA': 'UY', 'TIME_PERIOD': '2017-10', 'ENERGY_PRODUCT': 'NATGAS', 'FLOW_BREAKDOWN': 'TOTIMPSB', 'UNIT_MEASURE': 'M3', 'OBS_VALUE': '6', 'ASSESSMENT_CODE': '1'},
{'REF_AREA': 'VE', 'TIME_PERIOD': '2011-01', 'ENERGY_PRODUCT': 'LNG', 'FLOW_BREAKDOWN': 'EXPLNG', 'UNIT_MEASURE': 'KTONS', 'OBS_VALUE': '0', 'ASSESSMENT_CODE': '3'},
{'REF_AREA': 'VE', 'TIME_PERIOD': '2011-01', 'ENERGY_PRODUCT': 'LNG', 'FLOW_BREAKDOWN': 'EXPLNG', 'UNIT_MEASURE': 'M3', 'OBS_VALUE': '0', 'ASSESSMENT_CODE': '2'},
{'REF_AREA': 'VE', 'TIME_PERIOD': '2011-01', 'ENERGY_PRODUCT': 'LNG', 'FLOW_BREAKDOWN': 'IMPLNG', 'UNIT_MEASURE': 'KTONS', 'OBS_VALUE': '0', 'ASSESSMENT_CODE': '3'},
{'REF_AREA': 'VE', 'TIME_PERIOD': '2011-01', 'ENERGY_PRODUCT': 'LNG', 'FLOW_BREAKDOWN': 'IMPLNG', 'UNIT_MEASURE': 'M3', 'OBS_VALUE': '0', 'ASSESSMENT_CODE': '2'},
{'REF_AREA': 'US', 'TIME_PERIOD': '2014-09', 'ENERGY_PRODUCT': 'NATGAS', 'FLOW_BREAKDOWN': 'TOTIMPSB', 'UNIT_MEASURE': 'TJ', 'OBS_VALUE': '217004', 'ASSESSMENT_CODE': '1'},
{'REF_AREA': 'US', 'TIME_PERIOD': '2014-10', 'ENERGY_PRODUCT': 'NATGAS', 'FLOW_BREAKDOWN': 'CLOSTLV', 'UNIT_MEASURE': 'M3', 'OBS_VALUE': '101580', 'ASSESSMENT_CODE': '1'}] 

to this:

{"series_id": "VN\\NATGAS\\TOTIMPSB\\M3\\3", "fields": {"REF_AREA": "VN, Country code based on ISO 3166-1 alpha-2 standard", "ENERGY_PRODUCT": "NATGAS, Natural Gas", "FLOW_BREAKDOWN": "TOTIMPSB", "UNIT_MEASURE": "M3", "ASSESSMENT_CODE": "3, Data has not been assessed"}, "points": [["2012-01-01T00:00:00", 0.0], ["2012-02-01T00:00:00", 0.0], ["2012-03-01T00:00:00", 0.0], ["2012-04-01T00:00:00", 0.0], ["2012-11-01T00:00:00", 0.0], ["2012-12-01T00:00:00", 0.0], ["2013-01-01T00:00:00", 0.0], ["2013-02-01T00:00:00", 0.0], ["2013-03-01T00:00:00", 0.0]]}
{"series_id": "VN\\NATGAS\\TOTIMPSB\\TJ\\3", "fields": {"REF_AREA": "VN, Country code based on ISO 3166-1 alpha-2 standard", "ENERGY_PRODUCT": "NATGAS, Natural Gas", "FLOW_BREAKDOWN": "TOTIMPSB", "UNIT_MEASURE": "TJ", "ASSESSMENT_CODE": "3, Data has not been assessed"}, "points": [["2012-01-01T00:00:00", 0.0], ["2012-02-01T00:00:00", 0.0], ["2012-03-01T00:00:00", 0.0], ["2012-04-01T00:00:00", 0.0], ["2012-11-01T00:00:00", 0.0], ["2012-12-01T00:00:00", 0.0], ["2013-01-01T00:00:00", 0.0], ["2013-02-01T00:00:00", 0.0], ["2013-03-01T00:00:00", 0.0]]}
{"series_id": "ZA\\NATGAS\\CLOSTLV\\TJ\\2", "fields": {"REF_AREA": "ZA, Country code based on ISO 3166-1 alpha-2 standard", "ENERGY_PRODUCT": "NATGAS, Natural Gas", "FLOW_BREAKDOWN": "CLOSTLV", "UNIT_MEASURE": "TJ", "ASSESSMENT_CODE": "2, Consult metadata/Use with caution"}, "points": [["2010-06-01T00:00:00", 616.814], ["2010-07-01T00:00:00", 655.538], ["2010-08-01T00:00:00", 698.121], ["2010-09-01T00:00:00", 692.377], ["2010-10-01T00:00:00", 654.44], ["2010-11-01T00:00:00", 709.779], ["2010-12-01T00:00:00", 643.088], ["2011-01-01T00:00:00", 606.734], ["2011-02-01T00:00:00", 699.571], ["2011-03-01T00:00:00", 628.134], ["2011-04-01T00:00:00", 749.535], ["2011-05-01T00:00:00", 681.869], ["2011-06-01T00:00:00", 718.4], ["2011-07-01T00:00:00", 740.67], ["2011-08-01T00:00:00", 595.324], ["2011-09-01T00:00:00", 647.107], ["2011-10-01T00:00:00", 722.043], ["2011-11-01T00:00:00", 870.495], ["2011-12-01T00:00:00", 769.324], ["2012-01-01T00:00:00", 769.324], ["2012-02-01T00:00:00", 696.451], ["2012-03-01T00:00:00", 798.118], ["2012-04-01T00:00:00", 665.825], ["2012-05-01T00:00:00", 750.481], ["2012-06-01T00:00:00", 604.439], ["2012-08-01T00:00:00", 583.0], ["2012-09-01T00:00:00", 529.0], ["2012-10-01T00:00:00", 700.0], ["2013-01-01T00:00:00", 537.0], ["2013-02-01T00:00:00", 160.5], ["2013-03-01T00:00:00", 217.33], ["2013-04-01T00:00:00", 279.83], ["2013-05-01T00:00:00", 168.16], ["2013-06-01T00:00:00", 218.05], ["2013-07-01T00:00:00", 158.3], ["2013-08-01T00:00:00", 256.04], ["2013-09-01T00:00:00", 271.65], ["2013-10-01T00:00:00", 176.39], ["2013-11-01T00:00:00", 92.47], ["2013-12-01T00:00:00", 16.783], ["2014-01-01T00:00:00", 171.59], ["2014-02-01T00:00:00", 210.86], ["2014-03-01T00:00:00", 209.7], ["2014-04-01T00:00:00", 199.35], ["2014-05-01T00:00:00", 209.85], ["2014-06-01T00:00:00", 217.38], ["2014-07-01T00:00:00", 265.04], ["2014-08-01T00:00:00", 171.79], ["2014-09-01T00:00:00", 184.04], ["2014-10-01T00:00:00", 79.8], ["2014-11-01T00:00:00", 235.34], ["2014-12-01T00:00:00", 215.41], ["2015-01-01T00:00:00", 106.47], ["2015-02-01T00:00:00", 115.71], ["2015-03-01T00:00:00", 184.7], ["2015-04-01T00:00:00", 143.9], ["2015-05-01T00:00:00", 110.01], ["2015-06-01T00:00:00", 148.26], ["2015-07-01T00:00:00", 148.03], ["2015-08-01T00:00:00", 115.35], ["2015-09-01T00:00:00", 140.56], ["2015-10-01T00:00:00", 59.92], ["2015-11-01T00:00:00", 94.68], ["2015-12-01T00:00:00", 180.48], ["2016-01-01T00:00:00", 192.99], ["2016-02-01T00:00:00", 194.73], ["2016-03-01T00:00:00", 158.48], ["2016-04-01T00:00:00", 160.81], ["2016-05-01T00:00:00", 158.89], ["2016-06-01T00:00:00", 182.74], ["2016-07-01T00:00:00", 190.93], ["2016-08-01T00:00:00", 203.64], ["2016-09-01T00:00:00", 207.97], ["2016-10-01T00:00:00", 403.62], ["2016-11-01T00:00:00", 339.29], ["2016-12-01T00:00:00", 444.99], ["2017-01-01T00:00:00", 476.03], ["2017-02-01T00:00:00", 278.51], ["2017-03-01T00:00:00", 225.28], ["2017-04-01T00:00:00", 195.89], ["2017-05-01T00:00:00", 146.12], ["2017-06-01T00:00:00", 217.55], ["2017-07-01T00:00:00", 121.25], ["2017-08-01T00:00:00", 183.15], ["2017-09-01T00:00:00", 212.57], ["2017-10-01T00:00:00", 211.61], ["2017-11-01T00:00:00", 138.8], ["2017-12-01T00:00:00", 135.47], ["2018-01-01T00:00:00", 134.45], ["2018-02-01T00:00:00", 204.11], ["2018-03-01T00:00:00", 204.04], ["2018-04-01T00:00:00", 185.09], ["2018-05-01T00:00:00", 240.79], ["2018-06-01T00:00:00", 114.51]]}

About

An ETL in a Pure Python single script, service oriented, to make a request and download, unzip, extract, parse csv, normalize dataset and retrieve data as json

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages