Production runs of this code will execute on GINA's "Elephant" machine. Additional testing may occur on SNAP's "Atlas" compute nodes. These are both Linux machines. Other execution environments are not supported. Users will need ample free disk space (order 500 GB) per snow year. Elephant has 18 cores / 144 CPUs.
Create the conda environment from the environment.yml
file if you have not done so already.
conda env create -f environment.yml
Activate the environment:
conda activate viirs-snow
Developers and users must set the following environment variables:
These variables will be read by the configuration file. If the directories do not exist they will be created for you at runtime.
Set to a path where you will download and reference the input data from NSIDC. Anticpate needing around 80 GB free disk space per snow year to download all tiles for the full Alaska spatial domain. Example:
export INPUT_DIR=/export/datadir/"$USER"_viirs_snow/VIIRS_L3_snow_cover
Set the path where you will read/write intermediate data. Something like:
export SCRATCH_DIR=/export/datadir/"$USER"_viirs_snow/scratch
The scratch directory structure will look something like this:
/export/datadir/$USER_viirs_snow/scratch
└── $SNOW_YEAR
├── masks
├── preprocessed
├── reprojected_merged_single_metric_geotiffs
├── reprojected_single_metric_geotiffs
├── single_metric_geotiffs
└── uncertainty_geotiffs
Set to a path where you will write the snow metric outputs to disk. Use a shared disk location so multiple users can examine the output data.
export OUTPUT_DIR=/export/datadir/"$USER"_viirs_snow/VIIRS_snow_metrics
Set the "snow year" to download and process. Snow year 2015 begins August 1, 2015 and ends on July 31, 2016. Leap days are included.
export SNOW_YEAR=2015
Set to True (note this will be passed as a string
type rather than bool
) to work on a smaller spatial subset for the purpose of improving development speed. Default is True. Setting to False will trigger a production run over the entire Alaska domain.
export DEV_MODE=True
On occasion Dask worker memory is not released back to the OS. Setting this value to 0
or some other low number will aggressively and automatically trim the memory. This may yield a more stable, though perhaps slower, performance.
export MALLOC_TRIM_THRESHOLD_=0
Run this script with no arguments to download the source dataset from the NSIDC DAAC. Users will be prompted to enter valid Earthdata credentials. Users must have an Earthdata account to download the necessary data. Data will be downloaded to the $INPUT_DIR/$SNOW_YEAR
directory. The script will ask to wipe the contents of $INPUT_DIR/$SNOW_YEAR
before proceeding with the download. Total download time will of course depend on your connection speed, but it will also depend on how busy the upstream data service is and how complicated your data orders are. For example, if you are asking the service to perform reformatting (e.g., h5 to GeoTIFF) the service will take longer to prepare the order. Consider executing download.py
in a screen session or similar. It may take a full day to prepare the order for a download of an entire snow year. The order preparation may take longer than the download itself. A log file (download.log
) that captures the API endpoints used as well as information about the requested granules will be written to the same directory as the download script.
python download.py
Run this script with a tile_id
argument to preprocess the downloaded data. The script will analyze the data from $INPUT_DIR
and construct a hash table with keys based on the tile of the source dataset and the data variable (one of "Algorithm_Bit_Flags_QA", "Basic_QA" "CGF_NDSI_Snow_Cover", "Cloud_Persistence", "Daily_NDSI_Snow_Cover"
) represented by the GeoTIFF. The script will construct a time-indexed netCDF file containing the entire set of data for the SNOW_YEAR
being processed and write the file to $SCRATCH_DIR/$SNOW_YEAR/preprocessed
. Execution time is about 15 minutes per tile.
python preprocess.py h11v02
Apply a Savitzky-Golay filter to low illumination observations (solar zenith angles < 70 degrees), fill in observation gaps caused by night and cloud conditions, and write a new netCDF dataset containing the output. There are two arguments: the VIIRS Tile ID and the suffix to tag the output file name with. Execution time can range between 15 and 45 minutes per tile.
python smooth_low_illumination_observations.py h11v02 filtered_filled
Run this script with a tile_id
argument to create masks from the preproccesed data. Currently the script will create four mask GeoTIFFs for the following conditions: ocean, lake / inland water, L2 Fill (i.e., no data), and a combined mask of the previous conditions. Masks will be written to disk as GeoTIFFs to the $SCRATCH_DIR/$SNOW_YEAR/masks
directory. Execution time is about 2 minutes per tile.
python compute_masks.py h11v02
Run this script with a tile_id
argument to compute snow metrics from the preproccesed data. Outputs will be single-band GeoTIFFs (one per metric per tile) written to the $SCRATCH_DIR/$SNOW_YEAR/single_metric_geotiffs
directory. The metrics currently computed include (in no particular order):
- First Snow Day (FSD) of the full snow season (FSS). Also called FSS start day.
- Last Snow Day (LSD) of the FSS. Also called FSS end day.
- FSS Range: the length (duration) of the full snow season.
- Continuous Snow Season (CSS) Start Day: First day of longest CSS segment.
- CSS End Day: last day of longest CSS segment.
- CSS Range: the length (duration) of the longest CSS segment.
- Number of discrete CSS segments.
- Total CSS Days: summed duration of all CSS segments
- Number of Snow Days: count of all snow-covered days in a snow year
- Number of No Snow Days: count of all not snow-covered days in a snow year
- Number of Cloud Days: count of all cloud-covered days in a snow year
Execution time is about 15 minutes per tile.
python compute_snow_metrics.py h11v02
Run this script with no arguments to postprocess all data in the single_metric_geotiffs
directory. The script spawns subprocesses that call GDAL routines to reproject GeoTIFFs to ESPG:3338, align grids, and mosaic tiles. Outputs are written to compressed GeoTIFFs. Additional tasks will include stacking individual rasters to a final multiband GeoTIFF.
python postprocess.py
This script is used to compute metrics related to dark and cloud observations from a preprocessed single tile dataset. The script is required to develop, tune, and compare algorithms for filtering dark and/or cloudy observations. The input dataset is analyzed and GeoTIFFs with information about snow cover observations around the onset (called "dusk" in the code) and conclusion (called "dawn" in the code) of a darkness or cloud condition are computed. For each of these obscured (darkness and cloud) conditions, the script will yield GeoTIFFs representing the time indicies of the observations prior to ("dusk") and immediately after ("dawn") of the obscured condition as well as the median time index between these values. GeoTIFFs are also created that represent the value of the CGF_NSDI_SNOWCOVER
variable at the dawn and dusk indices and the binary snow cover status (the variable value with the threshold applied) at these indices. The final GeoTIFF created indicates whether or not the binary snow cover status transitioned (off to on, on to off) during the obscured period. The script accepts user arguments for the input because using the script to tune the filtering algorithm requires computing the metrics for "raw" and "filtered" data. Omitting the input argument will run the script on the unfiltered data by default. Execution time will really depend on the tile that is being processed. Higher latitude tiles with more grid cells that experience long winter darkness periods will take longer. To compute on the smoothed data, just provide the file suffix.
python compute_dark_and_cloud_metrics.py h11v02
python compute_dark_and_cloud_metrics.py h11v02 --smoothed_input smoothed_low_illumination
Gather data for downstream uncertainty analyses. This module will read the preprocessed netCDF file and construct GeoTIFFs for maximum cloud persistence and also yield rasters that indicate other anomalous values (e.g., bowtie trim) if they occur anywhere in the time series. Running this script is not strictly required.
python gather_uncertainty_data.py h11v02
This module contains convenience and utility functions that are used across multiple other scripts and modules. At the moment these mostly are functions for file input and output.