Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gage metadata additions. #39

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Gage metadata additions. #39

wants to merge 6 commits into from

Conversation

dblodgett-usgs
Copy link
Member

@dblodgett-usgs dblodgett-usgs commented Oct 4, 2024

This PR adds additional metadata to gages to better characterize how well accurate the gage links might be. See #26 and #38 for more.

Metadata now looks like:
image

Copy link

@jds485 jds485 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I have not run this code (let me know if you want me to do that). I have some questions for you and a few code suggestions.

@@ -107,7 +107,8 @@ get_cdec_gage_locations <- function(gages) {
nhdpv2_COMID = comid_medres,
provider_id = id) |>
mutate(nhdpv2_REACH_measure = rep(NA_real_, n()),
nhdpv2_COMID = as.numeric(nhdpv2_COMID))
nhdpv2_COMID = as.numeric(nhdpv2_COMID),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any COMIDs start with a 0?
I like to read in COMID as a character because it is a unique identifier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No -- they are strictly integers. I use characters too but comid can be either one.

nh <- read.csv(nwis_hydrolocation, colClasses = c("character",
"integer",
"character",
"numeric"))

nh <- mutate(nh, nhdpv2_link_source = "https://github.com/internetofwater/ref_gages/blob/main/data/nwis_hydrolocations.csv")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README file for this folder is blank. It looks like this file has corrections to the original data. If that is right, have you contacted the original source to make the corrections there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are corrections on publications that aren't really subject to change. The hope is that the file in this repo becomes the place to track manual fixes if another source of truth doesn't become available.

# when da_diff is negative, use within 25%
(all_gages$da_diff > 10 | (all_gages$da_diff < 0 & abs_norm_diff_da > 0.25))) |

# is tens of catchments and within 10%
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# is tens of catchments and within 10%
# is tens of kms and within 10%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant catchments here -- Since things are discretized to catchments for drainage area accumulation, the quanta is one catchment. The comment could read "is tens of catchments (~20-40sqkm) and within 10% drainage area match" since catchments are mostly in the range of 1-4sqkm. Does that make sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cutoff is for the drainage area, not catchments, right? Do you know that all the gages that meet this criterion have multiple catchments? That's why I was confused by the comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't -- I'm just going on the knowledge of the typical size of a catchment. Do you think it would be worth investigating gages on the margins? When dealing with such large numbers of features, I tend to plow ahead and try not to get too stuck on details but may be going to far with that approach in this case.

(all_gages$drainage_area_sqkm > 100 &
abs_norm_diff_da > (0.1)) |

# is hundreds of catchments and within 5%
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# is hundreds of catchments and within 5%
# is hundreds of kms and within 5%

bad_da <- all_gages[!is.na(diff_da) & diff_da > da_diff_thresh, ]
abs_norm_diff_da <- abs(norm_diff_da)

bad_da <- all_gages[!is.na(all_gages$da_diff) & # has an estimate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that your previous method considered any gage with a difference greater than 0.5 km^2 as bad, which may have identified more gages than this new filter. Have you compared the results of the old and new filters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was actually difference greater than 50% were removed and recalculated. Either way, I did look at what the 50% threshold looked like vs the new scheme.

The lines here are the threshold:

image
image

This is the final filter I came up with

372610036-ab5d0016-1847-493e-be61-59f19f12edf2
372610162-00633aa9-a544-4e4f-b7ef-077b347fb183

R/gage_locations.R Outdated Show resolved Hide resolved
Comment on lines +7 to 21
# primary output file for geoconnex reference server
reference_file <- "out/ref_gages.gpkg"

# registry csv file which is checked in
registry_csv <- "reg/ref_gages.csv"

# locations for all known reference gages
# https://github.com/internetofwater/ref_gages/issues/33
reference_locations_csv <- "reg/ref_locations.csv"

# contains information for each gage provider
providers_lookup_csv <- "reg/providers.csv"

# this is a set of location overrides
nwis_hydrolocation <- "data/nwis_hydrolocations.csv"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want targets to track changes to these files? If so, you'll need to create file or file_fast targets for them. It looks like some of these are tracked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

providers, yes. registry and ref_locations I don't. I'll create an issue to fix that.

@@ -63,6 +73,8 @@ list(
pnw_gage)),

### location normalization ###
# these targets generate a normalized form set of gages from each source.

# This Gage layer from NHDPlusV2 is a basic starting point for
# NWIS gage locations.
tar_target("nhdpv2_gage", select(read_sf(nat_db, "Gage"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comment: the target name does not need quotes. I actually didn't know you could use quotes!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahah -- I didn't know you didn't have to!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these two files generated?

missing_offset <- sf::st_transform(missing_offset, sf::st_crs(nhdpv2_fline))

new_indexes <- hydroloom::index_points_to_lines(nhdpv2_fline, sf::st_geometry(missing_offset),
search_radius = units::set_units(1000, "meters"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 km is far to search. Is this based on known location precision issues?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is far! It has to be this big because of large waterbodies that have artificial paths in the middle but the gage location on the shore. There's actually one in NHDPlusV2 gages that is >50km! on a Lake in upstate new york.

Co-authored-by: Jared D. Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants