Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue22 #23

Merged
merged 32 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
a6e0b01
Issue #18 - start of updating the datasource for 2020 timeseries pits…
micah-prime Jul 3, 2024
aeac5d5
new sources
micah-prime Jul 3, 2024
be74fc2
issue #18 working towards modified 2020 timeseries pits upload script
micah-prime Jul 3, 2024
8372291
path logic
micah-prime Jul 3, 2024
3050ce7
make sure to not use gap filled density at this point
micah-prime Jul 3, 2024
d852f98
Issue #18 - file for 2021 timeseries pits
micah-prime Jul 3, 2024
624c10b
Issue #18 no perimeter depth files for 2021 TS pits
micah-prime Jul 3, 2024
4863e72
having issues creating the test database
micah-prime Jul 3, 2024
a9065bf
Modify create script for sqlalchemy>2.0
micah-prime Jul 8, 2024
1d29427
Switch to 2020 V1 pits - there are some data format and header issues…
micah-prime Jul 8, 2024
c2c3e00
Use db_session function
micah-prime Jul 8, 2024
07864cb
Slight tweaks to 2021 timeseries script
micah-prime Jul 9, 2024
9ec87fb
Script to delete pits
micah-prime Jul 10, 2024
90ed14d
start using insitupy for metadata handling
micah-prime Jul 23, 2024
f949f72
working through handling metadata
micah-prime Jul 23, 2024
a11c841
2020 V2 data, allow split header line logic. ALSO - use the non-gap-f…
micah-prime Jul 23, 2024
90a20a5
get rid of spaces in flags
micah-prime Jul 23, 2024
94ddad0
Script for 2021 pits is working
micah-prime Jul 24, 2024
dd1547f
start working on SWE files for pits
micah-prime Jul 24, 2024
afaaa5b
move towards row based SRID and timezone ability
micah-prime Jul 25, 2024
4376a41
bulk swe property upload script working
micah-prime Jul 25, 2024
72531f4
Remove Python 3.7 compatability
micah-prime Jul 31, 2024
8005c93
fixing reqs in build
micah-prime Aug 5, 2024
b2e20f5
bump insitupy
micah-prime Aug 5, 2024
cdaef96
Fixing tests and build. SMP profile depths were not inverted
micah-prime Aug 5, 2024
c14b327
Seem to have a version issue because the etag comparison is still wor…
micah-prime Aug 5, 2024
4dc4537
update hash
micah-prime Aug 5, 2024
ad5dbe0
Issue #22 - start working on AK pits
micah-prime Aug 5, 2024
2a3276b
some progress on the alaska data
micah-prime Aug 7, 2024
4a89dd8
We don't need to manage empty files as long as headers are standard
micah-prime Aug 8, 2024
49f608f
Script for SWE summary of Alaksa pits working
micah-prime Aug 8, 2024
3cebb17
update db name for 2023 pits script
micah-prime Aug 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.7, 3.8, 3.9]
python-version: [3.8, 3.9, "3.10"]

services:

Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,5 @@ scripts/upload/test*.txt
.idea/*
scripts/download/data/*
venv/

credentials.json
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ pandoc==1.0.2
sphinxcontrib-apidoc==0.3.0
ipython==7.31.1
MarkupSafe<2.1.0
jupyterlab==2.2.10
6 changes: 4 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
wheel>0.34.0, <0.35.0
snowexsql>=0.3.0, <0.4.0
snowexsql>=0.4.1, <0.5.0
snowmicropyn
matplotlib>=3.2.2, <3.3.0
matplotlib>=3.2.2
moto==3.1.11
coloredlogs>=14.0
progressbar2>=3.51.3
rasterio>=1.1.5
boto3>=1.23.7,<1.24
timezonefinder>=6.0,<7.0
insitupy==0.1.2
1 change: 0 additions & 1 deletion requirements_dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,4 @@ coverage==4.5.4
twine==1.14.0
pytest==6.2.3
pytest-runner==5.1
jupyterlab==2.2.10
moto==3.1.11
2 changes: 2 additions & 0 deletions scripts/download/nsidc_sources.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ https://n5eil01u.ecs.nsidc.org/SNOWEX/SNEX20_SD.001/
https://n5eil01u.ecs.nsidc.org/SNOWEX/SNEX20_GM_CSU_GPR.001/2020.02.06/SNEX20_GM_CSU_GPR_1GHz_v01.csv
https://n5eil01u.ecs.nsidc.org/SNOWEX/SNEX20_UNM_GPR.001/2020.01.28/SNEX20_UNM_GPR.csv
https://n5eil01u.ecs.nsidc.org/SNOWEX/SNEX20_SD_TLI.001/2019.09.29/SNEX20_SD_TLI_clean.csv
https://n5eil01u.ecs.nsidc.org/SNOWEX/SNEX20_TS_SP.002/
https://n5eil01u.ecs.nsidc.org/SNOWEX/SNEX21_TS_SP.001/
68 changes: 68 additions & 0 deletions scripts/remove_data/remove_pits.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
"""
File to remove all snowpits from the database
"""
import argparse
from snowexsql.api import db_session
from snowexsql.data import LayerData
from snowexsql.db import get_db


def main():
parser = argparse.ArgumentParser(
description='Script to create our databases using the python library')
parser.add_argument('--db', dest='db', default='snowex',
help='Name of the database locally to add tables to')
parser.add_argument('--dry_run', dest='dry_run', action='store_true',
help='Try a dry run or not')
parser.add_argument('--credentials', dest='credentials',
default='./credentials.json',
help='Past to a json containing')
args = parser.parse_args()

credentials = args.credentials
db_name = f'localhost/{args.db}'
dry_run = args.dry_run

# All measurement 'types' associate with pits
types_pit = [
'sample_signal', 'grain_size', 'density', 'reflectance',
'permittivity', 'lwc_vol', 'manual_wetness',
'equivalent_diameter', 'specific_surface_area', 'grain_type',
'temperature', 'hand_hardness'
]
# Start a session
engine, session = get_db(db_name, credentials=credentials)
print(f"Connected to {db_name}")
try:
q = session.query(LayerData).filter(
LayerData.pit_id is not None # Filter to results with pit id
).filter(
LayerData.type.in_(types_pit) # Filter to correct type
)
result = q.count()
# Rough count of pits
estimated_number = int(result / float(len(types_pit)) / 10.0)
print(f"Found {result} records")
print(f"This is roughly {estimated_number} pits")
if dry_run:
print("THIS IS A DRYRUN, not deleting")
else:
if result > 0:
print("Deleting pits from the database")
# Delete
q.delete()
session.commit()
else:
print("No results, nothing to delete")
session.close()
except Exception as e:
print("Errored out, rolling back")
print(e)
session.rollback()
raise e

print("Done")


if __name__ == '__main__':
main()
114 changes: 114 additions & 0 deletions scripts/upload/add_alaska_pits_2023.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
"""
Script to upload the Snowex Time Series pits
"""

import glob
import re
from os.path import abspath, join
from pathlib import Path

from snowex_db.batch import UploadProfileBatch, UploadSiteDetailsBatch
from snowex_db.upload import PointDataCSV
from snowex_db import db_session


tz_map = {'US/Pacific': ['CA', 'NV', 'WA'],
'US/Mountain': ['CO', 'ID', 'NM', 'UT', 'MT'],
'US/Alaska': ["AK"]
}


def main():
"""
Add 2020 timeseries pits
"""
db_name = 'localhost/snowex'
# Preliminary data
doi = "None"
debug = True
timezone = "US/Alaska"

# Point to the downloaded data from
data_dir = abspath('../download/data/SNEX23_preliminary/Data/pits')
error_msg = []

# Files to ignore
ignore_files = [
"SnowEx23_SnowPits_AKIOP_Summary_Environment_v01.csv",
"SnowEx23_SnowPits_AKIOP_Summary_SWE_v01.csv"
]

# Get all the date folders
unique_folders = Path(
data_dir
).expanduser().absolute().glob("ALASKA*/*20*SNOW_PIT")
for udf in unique_folders:
# get all the csvs in the folder
dt_folder_files = list(udf.glob("*.csv"))
site_ids = []
# Get the unique site ids for this date folder
compiled = re.compile(
r'SnowEx23_SnowPits_AKIOP_([a-zA-Z0-9]*)_\d{8}.*_v01\.csv'
)
for file_path in dt_folder_files:
file_name = file_path.name
if file_name in ignore_files:
print(f"Skipping {file_name}")
continue
match = compiled.match(file_name)
if match:
code = match.group(1)
site_ids.append(code)
else:
raise RuntimeError(f"No site ID found for {file_name}")

# Get the unique site ids
site_ids = list(set(site_ids))

for site_id in site_ids:
# Grab all the csvs in the pits folder
filenames = glob.glob(join(str(udf), f'*_{site_id}_*.csv'))

# Grab all the site details files
sites = glob.glob(join(
str(udf), f'*_{site_id}_*siteDetails*.csv'
))

# Use no-gap-filled density
density_files = glob.glob(join(
str(udf), f'*_{site_id}_*_gapFilled_density*.csv'
))

# Remove the site details from the total file list to get only the
profiles = list(
set(filenames) - set(sites) -
set(density_files) # remove non-gap-filled denisty
)

# Submit all profiles associated with pit at a time
b = UploadProfileBatch(
filenames=profiles, debug=debug, doi=doi,
in_timezone=timezone,
db_name=db_name,
allow_split_lines=True, # Logic for split header lines
header_sep=":"
)
b.push()
error_msg += b.errors

# Upload the site details
sd = UploadSiteDetailsBatch(
filenames=sites, debug=debug, doi=doi,
in_timezone=timezone,
db_name=db_name
)
sd.push()
error_msg += sd.errors

for f, m in error_msg:
print(f)
return len(error_msg)


if __name__ == '__main__':
main()
77 changes: 77 additions & 0 deletions scripts/upload/add_pits_bulk_properties.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
"""
Script to upload the Snowex Time Series pits
"""

import glob
import re
from os.path import abspath, join
from pathlib import Path

import pandas as pd

from snowex_db.upload import PointDataCSV
from snowex_db import db_session


def main():
"""
Add bulk SWE, Depth, Density for 2020 and 2021 timeseires pits
"""
db_name = 'localhost/snowex'
debug = True

# Point to the downloaded data from
data_dir = abspath('../download/data/SNOWEX/')
error_msg = []

path_details = [
{
"DOI": "https://doi.org/10.5067/KZ43HVLZV6G4",
"path": "SNEX20_TS_SP.002/2019.10.24/SNEX20_TS_SP_Summary_SWE_v02.csv"
},
{
"DOI": "https://doi.org/10.5067/QIANJYJGRWOV",
"path": "SNEX21_TS_SP.001/2020.11.16/SNEX21_TS_SP_Summary_SWE_v01.csv"
},
# Preliminary data from 2023 Alask pits
{
"DOI": None,
"path": "../SNEX23_preliminary/Data/SnowEx23_SnowPits_AKIOP_Summary_SWE_v01.csv"
}
]
for info in path_details:
doi = info["DOI"]
file_path = join(data_dir, info["path"])
# Read csv and dump new one without the extra header lines
df = pd.read_csv(
file_path,
skiprows=list(range(32)) + [33]
)
new_name = file_path.replace(".csv", "_modified.csv")
# Filter to columns we want (density, swe, etc)
columns = [
'Location', 'Site', 'PitID', 'Date/Local Standard Time', 'UTM Zone',
'Easting (m)', 'Northing (m)', 'Latitude (deg)', 'Longitude (deg)',
'Density Mean (kg/m^3)',
'SWE (mm)', 'HS (cm)', "Snow Void (cm)", 'Flag'
]
df_columns = df.columns.values
filtered_columns = [c for c in columns if c in df_columns]
df = df.loc[:, filtered_columns]
df.to_csv(new_name, index=False)

# Submit SWE file data as point data
with db_session(
db_name, credentials='credentials.json'
) as (session, engine):
pcsv = PointDataCSV(
new_name, doi=doi, debug=debug,
depth_is_metadata=False,
row_based_crs=True,
row_based_timezone=True
)
pcsv.submit(session)


if __name__ == '__main__':
main()
80 changes: 0 additions & 80 deletions scripts/upload/add_time_series_pits.py

This file was deleted.

Loading
Loading