Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider disk-backed, caching, derived epi_archives / targets/etc. interop #358

Open
brookslogan opened this issue Oct 13, 2023 · 2 comments

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Oct 13, 2023

It would be nice to have an epi archive backed by disk / many files + we talked about the current exploration framework based on the targets package + annoyances going through epix_slide when also looping across things other than forecast_date (+ annoyances with epix_slide when the slide function outputs errors/etc.) I think this points to:

  • epi_archive should be something more general that can have various backends. I sketched some backends we might imagine here.
  • Maybe we want just a smarter/faster epix_as_of rather than epix_slide. (epix_slide isn't smart at all yet, but it'd be easier than making epix_as_of smarter. Making epix_as_of smarter probably involves caching the last query & result or some sort of partition of the data, while epix_slide would get to order things as it likes, partition once, etc.)
  • We often want a way to let slide computations output errors/warnings instead/alongside "real" results, rather than have epix_slide completely give up or spit out all the warnings without associated forecast_as_ofs at the end. I think purrr might have some useful helper functions here (purrr::safely?).
  • Would we want a derived_epi_archive that does some of this targets/targets-like stuff, or based on some (hypothetical?) library that does parallelism with only one pool of workers & adjusts for BLAS/gurobi/etc. using parallelism to prevent CPU/swap thrashing / OOM killing (investigate coro?)? It might still have some of the awkwardness of epix_slide, but might make some things look nicer; e.g., metaforecaster's forecasts would be a derived_epi_archive maybe atop a merge_epi_archive atop (a) the derived_epi_archive holding the base forecaster's forecasts and (b) the response data's ram_epi_archive.

Alternatively, if we don't have derived or cached archived concepts, can we integrate our slide computations with a framework that does? E.g., targets.

@brookslogan brookslogan changed the title Consider disk-backed, caching, derived epi_archives / targets interop Consider disk-backed, caching, derived epi_archives / targets/etc. interop Oct 13, 2023
@dsweber2 dsweber2 mentioned this issue Oct 26, 2023
@dshemetov
Copy link
Contributor

Another idea from Summer/Fall 2023 OKRs from @brookslogan :

  • compactify iteratively from set of files/fetchers; don’t require inputs batched in RAM

@brookslogan
Copy link
Contributor Author

compactify iteratively from set of files/fetchers; don’t require inputs batched in RAM

This is a lighter-weight alternative to or stopgap for the approach suggested in this issue. Most of the work would already be done by #343.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants