Warn dataset owners when dataset resources are HTTP; replace with HTTPS if not 404 #2985
Labels
component/catalog
Related to catalog component playbooks/roles
H2.0/Harvest-Runner
Harvest Source Processing for Harvesting 2.0
User Story
In order to maintain trust and accessibility of datasets we index (by preventing browser warnings), we want to ensure catalog.data.gov doesn't generate mixed http/https content.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
AND there is an equivalent https:// link that doesn't error
WHEN a harvest of that data source happens
THEN a warning about the HTTP link is generated in the harvest report
AND the HTTPS link is the one that's recorded.
AND there is NO equivalent https:// link that doesn't error
WHEN a harvest of that data source happens
THEN a warning about the HTTP link is generated in the harvest report
AND the HTTP link should not be recorded.
Background
https://blog.chromium.org/2020/02/protecting-users-from-insecure.html
Lynda reported this.
https://catalog.data.gov/dataset/fws-critical-habitat-for-threatened-and-endangered-species-datasetd55fc
Links to http resources, and chrome (and other modern web browsers) will block these downloads.
Example of the problem (at the time of issue creation):
http://ecos.fws.gov/docs/crithab/crithab_all/crithab_all_layers.zip
http://ecos.fws.gov/docs/crithab/crithab_all/crithab_all_shapefiles.zip
Security Considerations (required)
This change prevents catalog.data.gov from ever presenting mixed-content.
Sketch
[Notes or a checklist reflecting our understanding of the selected approach]
Note the upstream issue we filed.
The text was updated successfully, but these errors were encountered: