-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gage metadata additions. #39
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I have not run this code (let me know if you want me to do that). I have some questions for you and a few code suggestions.
@@ -107,7 +107,8 @@ get_cdec_gage_locations <- function(gages) { | |||
nhdpv2_COMID = comid_medres, | |||
provider_id = id) |> | |||
mutate(nhdpv2_REACH_measure = rep(NA_real_, n()), | |||
nhdpv2_COMID = as.numeric(nhdpv2_COMID)) | |||
nhdpv2_COMID = as.numeric(nhdpv2_COMID), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any COMIDs start with a 0?
I like to read in COMID as a character because it is a unique identifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No -- they are strictly integers. I use characters too but comid can be either one.
nh <- read.csv(nwis_hydrolocation, colClasses = c("character", | ||
"integer", | ||
"character", | ||
"numeric")) | ||
|
||
nh <- mutate(nh, nhdpv2_link_source = "https://github.com/internetofwater/ref_gages/blob/main/data/nwis_hydrolocations.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README file for this folder is blank. It looks like this file has corrections to the original data. If that is right, have you contacted the original source to make the corrections there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are corrections on publications that aren't really subject to change. The hope is that the file in this repo becomes the place to track manual fixes if another source of truth doesn't become available.
# when da_diff is negative, use within 25% | ||
(all_gages$da_diff > 10 | (all_gages$da_diff < 0 & abs_norm_diff_da > 0.25))) | | ||
|
||
# is tens of catchments and within 10% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# is tens of catchments and within 10% | |
# is tens of kms and within 10% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant catchments here -- Since things are discretized to catchments for drainage area accumulation, the quanta is one catchment. The comment could read "is tens of catchments (~20-40sqkm) and within 10% drainage area match" since catchments are mostly in the range of 1-4sqkm. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cutoff is for the drainage area, not catchments, right? Do you know that all the gages that meet this criterion have multiple catchments? That's why I was confused by the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't -- I'm just going on the knowledge of the typical size of a catchment. Do you think it would be worth investigating gages on the margins? When dealing with such large numbers of features, I tend to plow ahead and try not to get too stuck on details but may be going to far with that approach in this case.
(all_gages$drainage_area_sqkm > 100 & | ||
abs_norm_diff_da > (0.1)) | | ||
|
||
# is hundreds of catchments and within 5% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# is hundreds of catchments and within 5% | |
# is hundreds of kms and within 5% |
bad_da <- all_gages[!is.na(diff_da) & diff_da > da_diff_thresh, ] | ||
abs_norm_diff_da <- abs(norm_diff_da) | ||
|
||
bad_da <- all_gages[!is.na(all_gages$da_diff) & # has an estimate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that your previous method considered any gage with a difference greater than 0.5 km^2 as bad, which may have identified more gages than this new filter. Have you compared the results of the old and new filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# primary output file for geoconnex reference server | ||
reference_file <- "out/ref_gages.gpkg" | ||
|
||
# registry csv file which is checked in | ||
registry_csv <- "reg/ref_gages.csv" | ||
|
||
# locations for all known reference gages | ||
# https://github.com/internetofwater/ref_gages/issues/33 | ||
reference_locations_csv <- "reg/ref_locations.csv" | ||
|
||
# contains information for each gage provider | ||
providers_lookup_csv <- "reg/providers.csv" | ||
|
||
# this is a set of location overrides | ||
nwis_hydrolocation <- "data/nwis_hydrolocations.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want targets
to track changes to these files? If so, you'll need to create file
or file_fast
targets for them. It looks like some of these are tracked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providers, yes. registry and ref_locations I don't. I'll create an issue to fix that.
@@ -63,6 +73,8 @@ list( | |||
pnw_gage)), | |||
|
|||
### location normalization ### | |||
# these targets generate a normalized form set of gages from each source. | |||
|
|||
# This Gage layer from NHDPlusV2 is a basic starting point for | |||
# NWIS gage locations. | |||
tar_target("nhdpv2_gage", select(read_sf(nat_db, "Gage"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general comment: the target name does not need quotes. I actually didn't know you could use quotes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahah -- I didn't know you didn't have to!
reg/ref_gages.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these two files generated?
missing_offset <- sf::st_transform(missing_offset, sf::st_crs(nhdpv2_fline)) | ||
|
||
new_indexes <- hydroloom::index_points_to_lines(nhdpv2_fline, sf::st_geometry(missing_offset), | ||
search_radius = units::set_units(1000, "meters"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 km is far to search. Is this based on known location precision issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is far! It has to be this big because of large waterbodies that have artificial paths in the middle but the gage location on the shore. There's actually one in NHDPlusV2 gages that is >50km! on a Lake in upstate new york.
Co-authored-by: Jared D. Smith <[email protected]>
This PR adds additional metadata to gages to better characterize how well accurate the gage links might be. See #26 and #38 for more.
Metadata now looks like: