Consider disk-backed, caching, derived `epi_archives` / `targets`/etc. interop #358

brookslogan · 2023-10-13T16:29:57Z

It would be nice to have an epi archive backed by disk / many files + we talked about the current exploration framework based on the targets package + annoyances going through epix_slide when also looping across things other than forecast_date (+ annoyances with epix_slide when the slide function outputs errors/etc.) I think this points to:

epi_archive should be something more general that can have various backends. I sketched some backends we might imagine here.
Maybe we want just a smarter/faster epix_as_of rather than epix_slide. (epix_slide isn't smart at all yet, but it'd be easier than making epix_as_of smarter. Making epix_as_of smarter probably involves caching the last query & result or some sort of partition of the data, while epix_slide would get to order things as it likes, partition once, etc.)
We often want a way to let slide computations output errors/warnings instead/alongside "real" results, rather than have epix_slide completely give up or spit out all the warnings without associated forecast_as_ofs at the end. I think purrr might have some useful helper functions here (purrr::safely?).
Would we want a derived_epi_archive that does some of this targets/targets-like stuff, or based on some (hypothetical?) library that does parallelism with only one pool of workers & adjusts for BLAS/gurobi/etc. using parallelism to prevent CPU/swap thrashing / OOM killing (investigate coro?)? It might still have some of the awkwardness of epix_slide, but might make some things look nicer; e.g., metaforecaster's forecasts would be a derived_epi_archive maybe atop a merge_epi_archive atop (a) the derived_epi_archive holding the base forecaster's forecasts and (b) the response data's ram_epi_archive.

Alternatively, if we don't have derived or cached archived concepts, can we integrate our slide computations with a framework that does? E.g., targets.

The text was updated successfully, but these errors were encountered:

dshemetov · 2023-10-26T18:42:15Z

Another idea from Summer/Fall 2023 OKRs from @brookslogan :

compactify iteratively from set of files/fetchers; don’t require inputs batched in RAM

brookslogan · 2023-10-27T23:28:48Z

compactify iteratively from set of files/fetchers; don’t require inputs batched in RAM

This is a lighter-weight alternative to or stopgap for the approach suggested in this issue. Most of the work would already be done by #343.

brookslogan changed the title ~~Consider disk-backed, caching, derived epi_archives / targets interop~~ Consider disk-backed, caching, derived epi_archives / targets/etc. interop Oct 13, 2023

dsweber2 added this to the [CSTE] Improve scaling of epiprocess milestone Oct 23, 2023

dsweber2 mentioned this issue Oct 26, 2023

Epix rbind #343

Draft

brookslogan mentioned this issue Feb 1, 2024

Consider performance improvements for epix_slide, as_of #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider disk-backed, caching, derived `epi_archives` / `targets`/etc. interop #358

Consider disk-backed, caching, derived `epi_archives` / `targets`/etc. interop #358

brookslogan commented Oct 13, 2023 •

edited

Loading

dshemetov commented Oct 26, 2023

brookslogan commented Oct 27, 2023

Consider disk-backed, caching, derived epi_archives / targets/etc. interop #358

Consider disk-backed, caching, derived epi_archives / targets/etc. interop #358

Comments

brookslogan commented Oct 13, 2023 • edited Loading

dshemetov commented Oct 26, 2023

brookslogan commented Oct 27, 2023

Consider disk-backed, caching, derived `epi_archives` / `targets`/etc. interop #358

Consider disk-backed, caching, derived `epi_archives` / `targets`/etc. interop #358

brookslogan commented Oct 13, 2023 •

edited

Loading