Gage metadata additions. #39

dblodgett-usgs · 2024-10-04T02:47:54Z

This PR adds additional metadata to gages to better characterize how well accurate the gage links might be. See #26 and #38 for more.

Metadata now looks like:

jds485

This looks good. I have not run this code (let me know if you want me to do that). I have some questions for you and a few code suggestions.

jds485 · 2024-10-04T17:12:58Z

R/gage_locations.R

@@ -107,7 +107,8 @@ get_cdec_gage_locations <- function(gages) {
           nhdpv2_COMID = comid_medres,
           provider_id = id) |>
    mutate(nhdpv2_REACH_measure = rep(NA_real_, n()),
-           nhdpv2_COMID = as.numeric(nhdpv2_COMID))
+           nhdpv2_COMID = as.numeric(nhdpv2_COMID),


Do any COMIDs start with a 0?
I like to read in COMID as a character because it is a unique identifier

No -- they are strictly integers. I use characters too but comid can be either one.

jds485 · 2024-10-04T17:15:29Z

R/gage_locations.R

  nh <- read.csv(nwis_hydrolocation, colClasses = c("character", 
                                                    "integer", 
                                                    "character", 
                                                    "numeric"))

+  nh <- mutate(nh, nhdpv2_link_source = "https://github.com/internetofwater/ref_gages/blob/main/data/nwis_hydrolocations.csv")


The README file for this folder is blank. It looks like this file has corrections to the original data. If that is right, have you contacted the original source to make the corrections there?

These are corrections on publications that aren't really subject to change. The hope is that the file in this repo becomes the place to track manual fixes if another source of truth doesn't become available.

jds485 · 2024-10-04T17:24:26Z

R/gage_locations.R

+                            # when da_diff is negative, use within 25%
+                            (all_gages$da_diff > 10 | (all_gages$da_diff < 0 & abs_norm_diff_da > 0.25))) |
+
+                           # is tens of catchments and within 10%


Suggested change

# is tens of catchments and within 10%

# is tens of kms and within 10%

I actually meant catchments here -- Since things are discretized to catchments for drainage area accumulation, the quanta is one catchment. The comment could read "is tens of catchments (~20-40sqkm) and within 10% drainage area match" since catchments are mostly in the range of 1-4sqkm. Does that make sense?

This cutoff is for the drainage area, not catchments, right? Do you know that all the gages that meet this criterion have multiple catchments? That's why I was confused by the comment

I don't -- I'm just going on the knowledge of the typical size of a catchment. Do you think it would be worth investigating gages on the margins? When dealing with such large numbers of features, I tend to plow ahead and try not to get too stuck on details but may be going to far with that approach in this case.

jds485 · 2024-10-04T17:24:37Z

R/gage_locations.R

+                          (all_gages$drainage_area_sqkm > 100 & 
+                             abs_norm_diff_da > (0.1)) | 
+
+                           # is hundreds of catchments and within 5%


Suggested change

# is hundreds of catchments and within 5%

# is hundreds of kms and within 5%

jds485 · 2024-10-04T17:26:58Z

R/gage_locations.R

-  bad_da <- all_gages[!is.na(diff_da) & diff_da > da_diff_thresh, ]
+  abs_norm_diff_da <- abs(norm_diff_da)
+
+  bad_da <- all_gages[!is.na(all_gages$da_diff) & # has an estimate


I see that your previous method considered any gage with a difference greater than 0.5 km^2 as bad, which may have identified more gages than this new filter. Have you compared the results of the old and new filters?

It was actually difference greater than 50% were removed and recalculated. Either way, I did look at what the 50% threshold looked like vs the new scheme.

The lines here are the threshold:

This is the final filter I came up with

R/gage_locations.R

jds485 · 2024-10-04T18:08:31Z

_targets.R

+# primary output file for geoconnex reference server
 reference_file <- "out/ref_gages.gpkg"

+# registry csv file which is checked in
+registry_csv <- "reg/ref_gages.csv"
+
+# locations for all known reference gages
+# https://github.com/internetofwater/ref_gages/issues/33
+reference_locations_csv <- "reg/ref_locations.csv"
+
+# contains information for each gage provider
+providers_lookup_csv <- "reg/providers.csv"
+
 # this is a set of location overrides
 nwis_hydrolocation <- "data/nwis_hydrolocations.csv"


Do you want targets to track changes to these files? If so, you'll need to create file or file_fast targets for them. It looks like some of these are tracked.

providers, yes. registry and ref_locations I don't. I'll create an issue to fix that.

jds485 · 2024-10-04T18:10:48Z

_targets.R

@@ -63,6 +73,8 @@ list(
                                                  pnw_gage)),

  ### location normalization ###
+  # these targets generate a normalized form set of gages from each source.
+
  # This Gage layer from NHDPlusV2 is a basic starting point for
  # NWIS gage locations.
  tar_target("nhdpv2_gage", select(read_sf(nat_db, "Gage"), 


general comment: the target name does not need quotes. I actually didn't know you could use quotes!

hahah -- I didn't know you didn't have to!

jds485 · 2024-10-04T18:14:18Z

reg/ref_gages.csv

How are these two files generated?

jds485 · 2024-10-04T18:20:55Z

R/gage_locations.R

+  missing_offset <- sf::st_transform(missing_offset, sf::st_crs(nhdpv2_fline))
+
+  new_indexes <- hydroloom::index_points_to_lines(nhdpv2_fline, sf::st_geometry(missing_offset),
+                                                  search_radius =  units::set_units(1000, "meters"),


1 km is far to search. Is this based on known location precision issues?

It is far! It has to be this big because of large waterbodies that have artificial paths in the middle but the gage location on the shore. There's actually one in NHDPlusV2 gages that is >50km! on a Lake in upstate new york.

Co-authored-by: Jared D. Smith <[email protected]>

dblodgett-usgs added 5 commits September 30, 2024 22:28

add nhdpv2_link_source for #26

db3d179

add drainage area metrics to reference gages fixes #38

130d476

get offset added to gages missing it

6aa7f51

add reference locations for new gages add stub for validations

794434e

validations

21203f3

dblodgett-usgs mentioned this pull request Oct 4, 2024

Check COMID match for 3 gages #38

Open

jds485 suggested changes Oct 4, 2024

View reviewed changes

Update R/gage_locations.R

47d8261

Co-authored-by: Jared D. Smith <[email protected]>

dblodgett-usgs mentioned this pull request Oct 4, 2024

double check that targets "files" tracking is correct. #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gage metadata additions. #39

Gage metadata additions. #39

dblodgett-usgs commented Oct 4, 2024 •

edited

Loading

jds485 left a comment

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

jds485 Oct 4, 2024

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

jds485 Oct 4, 2024

jds485 Oct 4, 2024

dblodgett-usgs Oct 4, 2024

	# is tens of catchments and within 10%
	# is tens of kms and within 10%

	# is hundreds of catchments and within 5%
	# is hundreds of kms and within 5%

Gage metadata additions. #39

Are you sure you want to change the base?

Gage metadata additions. #39

Conversation

dblodgett-usgs commented Oct 4, 2024 • edited Loading

jds485 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dblodgett-usgs commented Oct 4, 2024 •

edited

Loading