should earthaccess provide a way to keep track of input query parameters? #847

JessicaS11 · 2024-10-24T14:33:23Z

JessicaS11
Oct 24, 2024
Maintainer

In exploring replacing the icepyx.Query module with direct usage of earthaccess, a few fundamental questions are surfacing that I think are also relevant topics to explore in earthaccess. This one concerns a basic difference in an ipx.Query object and earthaccess.search (which returns a list). Specifically, earthaccess does not in any way "store" the user's search criteria. It passes them through and returns a list of results. Both approaches (storing an object with the search criteria and results vs not) have their advantages and disadvantages.

I'm curious to hear from others if the benefit of having some of this information stored in an object is worth the cost of having the object (and having the user interact with it rather than, e.g., the functions surfaced directly through earthaccess.api). My personal bias (surprise!) is that the object is nice to have: I can see exactly what search parameters I used (temporal, spatial, cloud-or-not, collection, etc.) for the set of results attached to it. I can use that information to feed into another API or tool, and if I change my code without updating my object I'm not confused by the results I have (plus, I don't have to scroll to the top of my notebook if I want any of that info). And I can have multiple objects with different search parameters and results attached (that I'm thus less likely to muck up, and also that I can do per-dataset operations on).

I think this is an important conversation for moving forward with how earthaccess and icepyx will interface (and other plugins too). How will the plugins need to check if the earthaccess results they've been passed are valid for working with their tool? How would icepyx "get" and "use" earthaccess search results for its other capabilities (submitting a subset request; reading in data), given users can change filenames and not all datasets have the same metadata (so there'd be a lot of try/if statements to guess at what the user has passed in)?

jhkennedy · 2024-10-25T19:31:02Z

jhkennedy
Oct 25, 2024
Maintainer

@JessicaS11 I think keeping track of the search parameters that got you to a search result is a good idea. As you note, we return a vanilla list, so there's nowhere to store that kind of information in the results object. For this, and the reasons discussed here, I think we should pivot to returning a results object so we can provide richer methods/metadata about results.

Looking at ipx.Query, it looks like both searching/ordering and the results of those operations are contained in the class. I think on the earthaccess side, it would look more like earthaccess search methods (e.g., search_data) would stay package level methods^ but would return a Results object instead of a vanilla list:

>>> results = earthacces.search_data(...)
>>> type(results)
Earthaccess.Results
>>> search_args = results.search_args() # some dict, or a dict-like object that allows you to do:
>>> results_again = earthaccess.search_data(**search_args) 
>>> results == results_again
True

Which I think would get you most, if not all, of the functionality you want.

given users can change filenames all datasets have the same metadata (so there'd be a lot of try/if statements to guess at what the user has passed in)?

Can you expand on both of these? I am not sure I quite follow

^ There's a good argument to pull the search stuff into a class as well so that you could search multiple maturities or different catalogs at the same time. Technically, earthaccess already has this as packages are just singleton class objects, but allowing multiple instances could be helpful (e.g., you wouldn't have to pass an auth object around).

I do, however, prefer keeping the search classes and the results classes separate instead of combined like in ipx.Query.

0 replies

chuckwondo · 2024-10-25T20:11:12Z

chuckwondo
Oct 25, 2024
Maintainer

I'm a bit confused. Don't earthaccess.DataCollections and earthaccess.DataGranules already serve (at least most of) this purpose?

3 replies

jhkennedy Oct 25, 2024
Maintainer

@chuckwondo you're right, DataCollections and DataGranules do provide the stuff in my footnote/aside -- that is they are effectively API query classes. I always forget about them as they have rather unfortunate names (the non-plural versions of them, DataGranule and DataCollection, are result objects), and we mostly steer users towards towards eathaccess.search_datasets/earthaccess.search_data which calls them under the hood.

The issue @JessicaS11 is struggling with is that DataCollections/DataGranules return plain lists of DataCollction/DataGranule objects:
https://github.com/nsidc/earthaccess/blob/main/earthaccess/search.py#L112-L115

And so from the search result there's no way to reproduce the search that got that result and we can't provide a richer representation of a search result

but yes, I think you're right that those classes could do effectively everything she needs with little (stuffing the results into an object attribute) or no modification (delay calling get).

jhkennedy Oct 25, 2024
Maintainer

And by unfortunately named, I mean that:

DataCollections/DataGranules don't give a hint of what they do, unlike the base cmr_python CollectionQuery and GranuleQuery class names
They would be the logical name for a class that contains a group of DataCollection and DataGranule instances (e.g., a search result)

This also applies to DataServices:
https://github.com/nsidc/earthaccess/blob/main/earthaccess/services.py#L11

jhkennedy Oct 25, 2024
Maintainer

🤔 I suppose if you think about them as lazy, they are effectively a results object. With that view, what I'd like is a "load" method that stores the actual results, methods to index the results, and (potentially) ways to combine multiple results, which all could be added to these classes.

I still think they are overloaded, however, and querying and results should be separate classes.

JessicaS11 · 2024-10-29T19:11:49Z

JessicaS11
Oct 29, 2024
Maintainer Author

Notes from today's hack session:
Goal: return a results object (in earthaccess.api) instead of a list of specific results
[short-term] Plan: add granules property and new methods to earthaccess.DataCollections to make it behave like a list, then return to the user in earthaccess.search_datasets the actual DataCollections object.
[longer-term] Plan: Ultimately, earthaccess API methods like download and open would then act on this object rather than expecting the user to supply granules and provider inputs

For further discussion: separate the query and results objects entirely. This would be a breaking change but also help users when they need to be authenticated with multiple providers (since the auth wouldn't be attached to a package level earthaccess object). Another question is whether or not users directly call DataCollections rather than the api search_datasets as shown (e.g.) in the Earthdata Cloud Cookbook, which could influence how breaking a change it truly is.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

should earthaccess provide a way to keep track of input query parameters? #847

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

should earthaccess provide a way to keep track of input query parameters? #847

JessicaS11 Oct 24, 2024 Maintainer

Replies: 3 comments · 3 replies

jhkennedy Oct 25, 2024 Maintainer

chuckwondo Oct 25, 2024 Maintainer

jhkennedy Oct 25, 2024 Maintainer

jhkennedy Oct 25, 2024 Maintainer

jhkennedy Oct 25, 2024 Maintainer

JessicaS11 Oct 29, 2024 Maintainer Author

JessicaS11
Oct 24, 2024
Maintainer

Replies: 3 comments 3 replies

jhkennedy
Oct 25, 2024
Maintainer

chuckwondo
Oct 25, 2024
Maintainer

jhkennedy Oct 25, 2024
Maintainer

jhkennedy Oct 25, 2024
Maintainer

jhkennedy Oct 25, 2024
Maintainer

JessicaS11
Oct 29, 2024
Maintainer Author