Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cms_list_stac_files does not work for single-file datasets #36

Open
raymondben opened this issue Feb 2, 2024 · 2 comments
Open

cms_list_stac_files does not work for single-file datasets #36

raymondben opened this issue Feb 2, 2024 · 2 comments

Comments

@raymondben
Copy link

Thanks for the great work with this package. Found a small problem for datasets that consist of only a single file. e.g. the MDT dataset:

> cms_list_stac_files(product = "SEALEVEL_GLO_PHY_MDT_008_063")
# A tibble: 0 × 0

It is happening because their API is returning the actual file, not its bucket, when you query the stac properties:

> cms_stac_properties(product = "SEALEVEL_GLO_PHY_MDT_008_063")$href
[1] "https://s3.waw3-1.cloudferro.com/mdl-native-07/native/SEALEVEL_GLO_PHY_MDT_008_063/cnes_obs-sl_glo_phy-mdt_my_0.125deg_P20Y_202012/mdt_hybrid_cnes_cls18_cmems2020_global.nc"

cms_list_stac_files tries to issue a list-bucket request to this URL, which of course doesn't work.

I have a workaround for my own needs, but it would be good to fix. I have not provided a PR because I don't know the best solution. You could perhaps detect the fact that the href ends with an actual filename and throw that part away. But reliably detecting filenames might not be straightforward. Known file extensions or perhaps even just paths that end with "." followed by two or three more characters, but either way seems like it would be fragile.

I don't think you can rely on href having a predictable structure (e.g. https://host/*-native-*/native/DATASET_ID/LAYER/FILE) because I am guessing that there could be additional subdirectories in between LAYER and FILE. (But if that's not the case, then this might work. Just throw away anything after the 7th element in https://github.com/pepijn-devries/CopernicusMarine/blob/master/R/cms_list_stac_files.r#L12).

You definitely cannot rely on the actual URL in the href. For the example above, you can see that it's pointing to the file mdt_hybrid_cnes_cls18_cmems2020_global.nc. But that file doesn't actually exist, and when you do a bucket-list query on the bucket, it turns out that the file is called something else. That seems like an error from Copernicus, but nonetheless I think you still have to go through the bucket-list step.

@raymondben
Copy link
Author

(Also, minor suggestion that I stumbled across while debugging this: you don't need https://github.com/pepijn-devries/CopernicusMarine/blob/master/R/cms_list_stac_files.r#L7. Just put a .data$ prefix on assets in line 12.)

@pepijn-devries
Copy link
Owner

Hi @raymondben,

Thank you for the detailed report. This is a great help to improve the package. I will study your case and think about how to best handle the case where STAC responds with just the file, instead of a bucket. Your suggestions are really helpful

(Also, minor suggestion that I stumbled across while debugging this: you don't need https://github.com/pepijn-devries/CopernicusMarine/blob/master/R/cms_list_stac_files.r#L7. Just put a .data$ prefix on assets in line 12.)

This is also a good point. I think that assets <- NULL is a relic from an earlier version where I didn't import rlang's pronoun .data. Your suggestion would make the code easier to read. I will update this.

I'll leave this issue open until I have decided on a definitive solution

Cheers,

Pepijn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants