You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems NCEP add several custom encodings to the WMO Grib2 standard that the CFGRIB library can't decode.
In the HRRR SubHourly product the Step is encoded as a range which breaks the CFGrib reader. When reading these variables with scan_grib you will get messages like: 2024-01-09T16:59:33.060Z MainProcess MainThread WARNING:grib2-to-zarr:Ignoring coordinate 'step' for varname 'vbdsf', raises: eccodes.WrongStepUnitError(Wrong units for step (step must be integer)) for variables: dswrf, vbdsf , tp , sdwe and unknown. Some of the grib messages really fail to decode - resulting in the unknowns. The step can be inferred from the runtime and the validtime of the model, but I think NCEP was trying to encode the duration of the average for the variables with stepType avg.
By comparing the results of using scan_grib and parsing the idx files provided by ncep, I was able to identify a few more edge cases. The table below shows some of the variables from gs://global-forecast-system/gfs.20231001/00/atmos/gfs.t00z.pgrb2.0p25.f006 which have duplicate variable name, step type, level type and level. Currently, the grib_tree method assumes these will be unique and silently takes the data from the last message in the file.
There are two types of duplicates I have found so far:
The GFS grib2 files include two accumulations for Convective Precipitation and Total Precipitation. One is the accumulation during the current model step and one is the total accumulation during the forecast run so far. With the step value parsed by CFGrib this is ambiguous for all model horizons (0 to 240 hour forecast files). With the idx file (gs://global-forecast-system/gfs.20231001/00/atmos/gfs.t00z.pgrb2.0p25.f006.idx) we can see a bit more metadata ACPCP:surface:0-6 hour acc fcst but for the first few timesteps of the model, even the idx values appear to be duplicates because the total is equal to the step accumulation.
There are several other variables that have level range such as 180-0 mb above ground and 0.44-1 sigma layer which decode as NaN with CFGrib (via kerchunk scan_grib). These result in additional duplicates which can confuse grib_tree (and anybody using it).
varname
typeOfLevel
stepType
level
offset_idx
date
attrs
length_idx
idx_uri
grib_uri
idx_indexed_at
grib_crc32
grib_updated_at
idx_crc32
idx_updated_at
name
step
time
valid_time
uri
offset_grib
length_grib
inline_value
acpcp
surface
accum
0.0
426078582
d=2023100100
ACPCP:surface:0-6 hour acc fcst:\n
279631
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective precipitation (water)
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
426078582
279631
None
acpcp
surface
accum
0.0
426358213
d=2023100100
ACPCP:surface:0-6 hour acc fcst:\n
279631
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective precipitation (water)
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
426358213
279631
None
cape
pressureFromGroundLayer
instant
NaN
515902483
d=2023100100
CAPE:180-0 mb above ground:6 hour fcst:\n
530643
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective available potential energy
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
515902483
530643
None
cape
pressureFromGroundLayer
instant
NaN
526644614
d=2023100100
CAPE:90-0 mb above ground:6 hour fcst:\n
479705
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective available potential energy
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
526644614
479705
None
cape
pressureFromGroundLayer
instant
NaN
527482311
d=2023100100
CAPE:255-0 mb above ground:6 hour fcst:\n
514093
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective available potential energy
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
527482311
514093
None
cin
pressureFromGroundLayer
instant
NaN
516433126
d=2023100100
CIN:180-0 mb above ground:6 hour fcst:\n
343271
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective inhibition
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
516433126
343271
None
cin
pressureFromGroundLayer
instant
NaN
527124319
d=2023100100
CIN:90-0 mb above ground:6 hour fcst:\n
357992
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective inhibition
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
527124319
357992
None
cin
pressureFromGroundLayer
instant
NaN
527996404
d=2023100100
CIN:255-0 mb above ground:6 hour fcst:\n
306931
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Convective inhibition
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
527996404
306931
None
r
sigmaLayer
instant
NaN
518249965
d=2023100100
RH:0.33-1 sigma layer:6 hour fcst:\n
727263
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Relative humidity
0 days 06:00:00
2023-10-01
2023-10-01 06:00:00
gs://global-forecast-system/gfs.20231001/00/at...
518249965
727263
None
r
sigmaLayer
instant
NaN
518977228
d=2023100100
RH:0.44-1 sigma layer:6 hour fcst:\n
714324
gs://global-forecast-system/gfs.20231001/00/at...
gs://global-forecast-system/gfs.20231001/00/at...
2024-01-11 01:38:57.368924
iT+Wyg==
2023-10-01 03:34:14.440
fmnXTA==
2023-10-01 03:33:41.914
Relative humidity
0 days 06:00:00
2023-10-01
2023-10-01 06
Fixing the actual decoding of the variables is hard. It may be possible by adding custom ecCodes definitions.
In the mean time, I want this issue to exist in the world for anyone also wondering what is going on.
Suggestions on improving the behavior of grib_tree in the mean time would be welcome. At present it is silently picking the last grib message and using the data (offset and length) for the given variable. This might be more than a little surprising for some users.
The text was updated successfully, but these errors were encountered:
NCEP team would like to expose their grib tables in a machine readable form!
See NOAA-EMC/NCEPLIBS-grib_util#293 (comment)
This would provide the data needed to generate the custom ecCodes definitions.
It seems NCEP add several custom encodings to the WMO Grib2 standard that the CFGRIB library can't decode.
In the HRRR SubHourly product the Step is encoded as a range which breaks the CFGrib reader. When reading these variables with scan_grib you will get messages like:
2024-01-09T16:59:33.060Z MainProcess MainThread WARNING:grib2-to-zarr:Ignoring coordinate 'step' for varname 'vbdsf', raises: eccodes.WrongStepUnitError(Wrong units for step (step must be integer))
for variables:dswrf
,vbdsf
,tp
,sdwe
andunknown
. Some of the grib messages really fail to decode - resulting in theunknowns
. The step can be inferred from the runtime and the validtime of the model, but I think NCEP was trying to encode the duration of the average for the variables with stepTypeavg
.By comparing the results of using scan_grib and parsing the idx files provided by ncep, I was able to identify a few more edge cases. The table below shows some of the variables from gs://global-forecast-system/gfs.20231001/00/atmos/gfs.t00z.pgrb2.0p25.f006 which have duplicate
variable name
,step type
,level type
andlevel
. Currently, the grib_tree method assumes these will be unique and silently takes the data from the last message in the file.There are two types of duplicates I have found so far:
ACPCP:surface:0-6 hour acc fcst
but for the first few timesteps of the model, even the idx values appear to be duplicates because the total is equal to the step accumulation.180-0 mb above ground
and0.44-1 sigma layer
which decode as NaN with CFGrib (via kerchunk scan_grib). These result in additional duplicates which can confuse grib_tree (and anybody using it).Fixing the actual decoding of the variables is hard. It may be possible by adding custom ecCodes definitions.
In the mean time, I want this issue to exist in the world for anyone also wondering what is going on.
Suggestions on improving the behavior of grib_tree in the mean time would be welcome. At present it is silently picking the last grib message and using the data (offset and length) for the given variable. This might be more than a little surprising for some users.
The text was updated successfully, but these errors were encountered: