Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicate PR's.
- Moved example datasets from being hosted in the package to being reexported
from the
epidatasets
package. The datasets can no longer be loaded withdata()
but can be accessed withepiprocess::
or, after loading the package, just the name of the dataset (#520). Those with names starting withjhu
have been renamed to a more uniform scheme and now have names starting withcovid
. The data set previously namedjhu_confirmed_cumulative_num
has been removed from the package, but a renamed version is has been removed from the package, but a renamed version is still available inepidatasets
.
- Removed
.window_size = 1
default fromepi_slide_{mean,sum,opt}
; this argument is now mandatory, and should nearly always be greater than 1 except for testing purposes.
epi_slide
andepix_slide
now provide some hints if you forget a~
when using a formula to specify the slide computation, and other bits of forgotten syntax.- Improved validation of
.window_size
arguments. - Rewrote a lot of the package documentation to be more consistent and informative. Simplified and streamlined the vignettes.
- Removed vignette dependency on
covidcast
.
epi_slide
interface has major breaking changes.- All variables are now dot-prefixed to be more consistent with tidyverse style for functions that allow tidyeval.
- The
before/after
arguments have been replaced with the.window_size
and.align
arguments. names_sep
has been removed. If you return data frames from your computations:- without a name, they will be unpacked into separate columns without name prefixes
- with a name, it will become a packed data.frame-class column (see
tidyr::pack
).
as_list_col
has been removed. You can now directly return a list from your slide computations instead. If you were usingas_list_col=TRUE
, you will need to wrap your output in a list.- Ungrouped slides are no longer allowed in
epi_slide
. If you used this for geographic aggregation up to national, consider usingsum_groups_epi_df
. - Added
sum_groups_epi_df
to allow aggregation across key columns prior to sliding.
epix_slide
interface has major changes.- All variables are now dot-prefixed to be more consistent with tidyverse style for functions that allow tidyeval.
names_sep
has been removed. If you return data frames from your computations:- without a name, they will be unpacked into separate columns without name prefixes
- with a name, it will become a packed data.frame-class column (see
tidyr::pack
).
as_list_col
has been removed. You can now directly return a list from your slide computations instead. If you were usingas_list_col=TRUE
, you will need to wrap your output in a list.
as_epi_df()
now checks that every group has unique time values and errors if this is not the case. The same check is performed at the beginning ofepi_slide()
. This check is currently not enforced in dplyr operations (like for joins, mutates, or select), but we plan to add it in the future.as_epi_df()
oras_epi_archive()
no longer acceptadditional_metadata
. Use the newother_keys
arg to specify additional key columns, such as age group columns or other demographic breakdowns. Miscellaneous metadata are no longer handled byepiprocess
, but you can use R's built-inattr<-
instead for a similar feature.
- Added
complete.epi_df
, which fills in missing values in anepi_df
withNA
s. Usestidyr::complete
underneath and preservesepi_df
metadata. - Inclusion of the function
revision_summary
to provide basic revision information forepi_archive
s out of the box. (#492)
- Fix
epi_slide_opt
(and related functions) to correctly handlebefore=Inf
. Also allow multiple columns specified as a list of strings. - Disallow
after=Inf
in slide functions, since it doesn't seem like a likely use case and complicates code.
epi_df
's are now more strict about what types they allow in the time column. Namely, we are explicit about only supportingDate
at the daily and weekly cadence and generic integer types (for yearly cadence).epi_slide
before
andafter
arguments are now require the user to specific time units in certain cases. Thetime_step
argument has been removed.epix_slide
before
argument now defaults toInf
, and requires the user to specify units in some cases. Thetime_step
argument has been removed.detect_outlr_stl(seasonal_period = NULL)
is no longer accepted. Usedetect_outlr_stl(seasonal_period = <value>, seasonal_as_residual = TRUE)
instead. See?detect_outlr_stl
for more details.
epi_slide
computations are now 2-4 times faster after changing how reference time values, made accessible within sliding functions, are calculated (#397).- Add new
epi_slide_mean
function to allow much (~30x) faster rolling average computations in some cases (#400). - Add new
epi_slide_sum
function to allow much faster rolling sum computations in some cases (#433). - Add new
epi_slide_opt
function to allow much faster rolling computations in some cases, usingdata.table
andslider
optimized rolling functions (#433). - Add tidyselect interface for
epi_slide_opt
and derivatives (#452). - regenerated the
jhu_csse_daily_subset
dataset with the latest versions of the data from the API - changed approach to versioning, see DEVELOPMENT.md for details
select
on groupedepi_df
s now only dropsepi_df
ness if it makes sense; PR #390- Minor documentation updates; PR #393
- Improved
epi_archive
print method. Compactified metadata and shows a snippet of the underlyingDT
(#341). - Added
autoplot
method forepi_df
objects, which creates aggplot2
plot of theepi_df
(#382). - Refactored internals to use
cli
for warnings/errors andcheckmate
for argument checking (#413). - Fix logic to auto-assign
epi_df
time_type
toweek
(#416) andyear
(#441). - Clarified "Get started" example of getting Ebola line list data into
epi_df
format. - Improved documentation web site landing page's introduction.
- Fixed documentation referring to old
epi_slide()
interface (#466, thanks @XuedaShen!). as_epi_df
andas_epi_archive
now support arguments to specify column names e.g.as_epi_df(some_tibble, geo_value=state)
. In addition, there is a list of default conversions, seetime_column_names
for a list of columns that will automatically be recognized and converted totime_value
column (there are similar functions forgeo
andversion
).- Fixed bug where
epix_slide_ref_time_values_default()
on datetimes would output a huge number ofref_time_values
spaced apart by mere seconds. - In
epi_slide()
andepix_slide()
:- Multiple "data-masking" tidy evaluation expressions can be passed in via
...
, rather than just one. - Additional tidy evaluation features from
dplyr::mutate
are supported:!! name_var := value
, unnamed expressions evaluating to data frames, and= NULL
; see?epi_slide
for more details.
- Multiple "data-masking" tidy evaluation expressions can be passed in via
- Resolved some linting messages in package checks (#468).
- Added optional
decay_to_tibble
attribute controllingas_tibble()
behavior ofepi_df
s to let{epipredict}
work more easily with other libraries (#471). - Removed some external package dependencies.
- Switched
epi_df
'sother_keys
default fromNULL
tocharacter(0)
; PR #390 - Refactored
epi_archive
to use S3 instead of R6 for its object model. The functionality stay the same, but it will break the member function interface. For migration, you can usually just convertepi_archive$merge(...)
toepi_archive <- epi_archive %>% epix_merge(...)
(and the same forfill_through_version
andtruncate_after_version
) andepi_archive$slide(...)
toepi_archive %>% epix_slide(...)
(and the same foras_of
,group_by
,slide
, etc.) (#340). In some limited situations, such as if you have a helper function that callsepi_archive$merge
etc. on one of its arguments, then you may need to more carefully refactor them.
- Updated vignettes for compatibility with epidatr 1.0.0 in PR #377.
- Changes to
epi_slide
andepix_slide
:- If
f
is a function, it is now required to take at least three arguments.f
must take anepi_df
with the same column names as the archive'sDT
, minus theversion
column; followed by a one-row tibble containing the values of the grouping variables for the associated group; followed by a reference time value, usually as aDate
object. Optionally, it can take any number of additional arguments after that, and forward values for those arguments throughepi[x]_slide
's...
args.- To make your existing slide computations work, add a third argument to
your
f
function to accept this new input: e.g., changef = function(x, g, <any other arguments>) { <body> }
tof = function(x, g, rt, <any other arguments>) { <body> }
.
- To make your existing slide computations work, add a third argument to
your
- If
epi_slide
andepix_slide
also make the window data, group key and reference time value available to slide computations specified as formulas or tidy evaluation expressions, in additional or completely new ways.- If
f
is a formula, it can now access the reference time value via.z
or.ref_time_value
. - If
f
is missing, the tidy evaluation expression in...
can now refer to the window data as anepi_df
ortibble
with.x
, the group key with.group_key
, and the reference time value with.ref_time_value
. The usual.data
and.env
pronouns also work, butpick()
andcur_data()
are not; work off of.x
instead.
- If
epix_slide
has been made more likedplyr::group_modify
. It will no longer perform element/row recycling for size stability, accepts slide computation outputs containing any number of rows, and no longer supportsall_rows
.- To keep the old behavior, manually perform row recycling within
f
computations, and/orleft_join
a data frame representing the desired output structure with the currentepix_slide()
result to obtain the desired repetitions and completions expected withall_rows = TRUE
.
- To keep the old behavior, manually perform row recycling within
epix_slide
will only output grouped or ungrouped tibbles. Previously, it would sometimes outputepi_df
s, but not consistently, and not always with the metadata desired. Future versions will revisit this design, and consider more closely whether/when/how to output anepi_df
.- To keep the old behavior, convert the output of
epix_slide()
toepi_df
when desired and set the metadata appropriately.
- To keep the old behavior, convert the output of
epi_slide
andepix_slide
now supportas_list_col = TRUE
when the slide computations output atomic vectors, and output a list column in "chopped" format (seetidyr::chop
).epi_slide
now works properly with slide computations that output just aDate
vector, rather than convertingslide_value
to a numeric column.- Fix
?archive_cases_dv_subset
information regarding modifications of upstream data by @brookslogan in (#299). - Update to use updated
epidatr
(fetch_tbl
->fetch
) by @brookslogan in (#319).
- Changes to both
epi_slide
andepix_slide
:- The
n
,align
, andbefore
arguments have been replaced by newbefore
andafter
arguments. To migrate to the new version, replace these arguments in everyepi_slide
andepix_slide
call. If you were only using then
argument, then this means replacingn = <n value>
withbefore = <n value> - 1
.epi_slide
's time windows now extendbefore
time steps before andafter
time steps after the correspondingref_time_values
. See?epi_slide
for details on matching old alignments.epix_slide
's time windows now extendbefore
time steps before the correspondingref_time_values
all the way through the latest data available at the correspondingref_time_values
.
- Slide functions now keep any grouping of
x
in their results, likemutate
andgroup_modify
.- To obtain the old behavior,
dplyr::ungroup
the slide results immediately.
- To obtain the old behavior,
- The
- Additional
epi_slide
changes:- When using
as_list_col = TRUE
together withref_time_values
andall_rows=TRUE
, the marker for excluded computations is now aNULL
entry in the list column, rather than aNA
; if you are usingtidyr::unnest()
afterward and want to keep these missing data markers, you will need to replace theNULL
entries withNA
s. Skipped computations are now more uniformly detectable usingvctrs
methods.
- When using
- Additional
epix_slide
changes:epix_slide
'sgroup_by
argument has been replaced bydplyr::group_by
anddplyr::ungroup
S3 methods. Thegroup_by
method uses "data masking" (also referred to as "tidy evaluation") rather than "tidy selection".- Old syntax:
x %>% epix_slide(<other args>, group_by=c(col1, col2))
x %>% epix_slide(<other args>, group_by=all_of(colname_vector))
- New syntax:
x %>% group_by(col1, col2) %>% epix_slide(<other args>)
x %>% group_by(across(all_of(colname_vector))) %>% epix_slide(<other args>)
- Old syntax:
epix_slide
no longer defaults to grouping by non-time_value
, non-version
key columns, instead considering all data to be in one big group.- To obtain the old behavior, precede each
epix_slide
call lacking agroup_by
argument with an appropriategroup_by
call.
- To obtain the old behavior, precede each
epix_slide
now guessesref_time_values
to be a regularly spaced sequence covering all theDT$version
values and theversion_end
, rather than the distinctDT$time_value
s. To obtain the old behavior, pass inref_time_values = unique(<ungrouped archive>$DT$time_value)
.
epi_archive
'sclobberable_versions_start
's default is nowNA
, so there will be no warnings by default about potential nonreproducibility. To obtain the old behavior, pass inclobberable_versions_start = max_version_with_row_in(x)
.
- Fixed
[
on groupedepi_df
s to maintain the grouping if possible when dropping theepi_df
class (e.g., when removing thetime_value
column). - Fixed
epi_df
operations to be more consistent about decaying into non-epi_df
s when the result of the operation doesn't make sense as anepi_df
(e.g., when removing thetime_value
column). - Changed
bind_rows
on groupedepi_df
s to not drop theepi_df
class. Like with ungroupedepi_df
s, the metadata of the result is still simply taken from the first result, and may be inappropriate (#242). epi_slide
andepix_slide
now raise an error rather than silently filtering outref_time_values
that don't meet their expectations.
epix_slide
,<epi_archive>$slide
have a new parameterall_versions
. Withall_versions=TRUE
,epix_slide
will pass a filteredepi_archive
to each computation rather than anepi_df
snapshot. This enables, e.g., performing pseudoprospective forecasts with a revision-aware forecaster using nestedepix_slide
operations.
- Added
dplyr::group_by
anddplyr::ungroup
S3 methods forepi_archive
objects, plus corresponding$group_by
and$ungroup
R6 methods. Thegroup_by
implementation supports the.add
and.drop
arguments, andungroup
supports partial ungrouping with...
. as_epi_archive
,epi_archive$new
now perform checks for the key uniqueness requirement (part of #154).
- Added a
NEWS.md
file to track changes to the package. - Implemented
?dplyr::dplyr_extending
forepi_df
s (#223). - Fixed various small documentation issues (#217).
epix_slide
,<epi_archive>$slide
now feedf
anepi_df
rather than converting to a tibble/tbl_df
first, allowing use ofepi_df
methods and metadata, and often yieldingepi_df
s out of the slide as a result. To obtain the old behavior, convert to a tibble withinf
.
- Fixed
epix_merge
,<epi_archive>$merge
always raising error onsync="truncate"
.
- Added
Remotes:
entry forgenlasso
, which was removed from CRAN. - Added
as_epi_archive
tests. - Added missing
epix_merge
test forsync="truncate"
.
- Fixed
[.epi_df
to not reorder columns, which was incompatible with downstream packages. - Changed
[.epi_df
decay-to-tibble logic to more coherent withepi_df
s current tolerance of nonunique keys: stopped decaying to a tibble in some cases where a unique key wouldn't have been preserved, since we don't enforce a unique key elsewhere. - Fixed
[.epi_df
to adjust"other_keys"
metadata when corresponding columns are selected out. - Fixed
[.epi_df
to raise an error if resulting column names would be nonunique. - Fixed
[.epi_df
to drop metadata if decaying to a tibble (due to removal of essential columns).
- Added check that
epi_df
additional_metadata
is list. - Fixed some incorrect
as_epi_df
examples.
- Applied rename of upstream package in examples:
delphi.epidata
->epidatr
. - Rounded out
[.epi_df
tests.
as_epi_archive
,epi_archive$new
:- Compactification (see below) by default may change results if working
directly with the
epi_archive
'sDT
field; to disable, pass incompactify=FALSE
.
- Compactification (see below) by default may change results if working
directly with the
epi_archive
's wrappers and R6 methods have been updated to follow these rules regarding reference semantics:epix_<method>
will not mutate inputepi_archive
s, but may alias them or alias their fields (which should not be a worry if a user sticks to theseepix_*
functions and "regular" R functions with copy-on-write-like behavior, avoiding mutating functions[.data.table
).x$<method>
may mutatex
; if it mutatesx
, it will returnx
invisibly (where this makes sense), and, for each of its fields, may either mutate the object to which it refers or reseat the reference (but not both); ifx$<method>
does not mutatex
, its result may contain aliases tox
or its fields.
epix_merge
,<epi_archive>$merge
:- Removed
...
,locf
, andnan
parameters. - Changed the default behavior, which now corresponds to using
by=key(x$DT)
(but demanding that is the same set of column names askey(y$DT)
),all=TRUE
,locf=TRUE
,nan=NaN
(but with the post-filling step fixed to only apply to gaps, and no longer fill overNA
s originating fromx$DT
andy$DT
). x
andy
are no longer allowed to share names of non-by
columns.epix_merge
no longer mutates itsx
argument (but$merge
continues to do so).- Removed (undocumented) capability of passing a
data.table
asy
.
- Removed
epix_slide
:- Removed inappropriate/misleading
n=7
default argument (due to reporting latency,n=7
will not yield 7 days of data in a typical daily-reporting surveillance data source, as one might have assumed).
- Removed inappropriate/misleading
as_epi_archive
,epi_archive$new
:- New
compactify
parameter allows removal of rows that are redundant for the purposes ofepi_archive
's methods, which use the last version of each observation carried forward. - New
clobberable_versions_start
field allows marking a range of versions that could be "clobbered" (rewritten without assigning new version tags); previously, this was hard-coded asmax(<epi_archive>$DT$version)
. - New
versions_end
field allows marking a range of versions beyondmax(<epi_archive>$DT$version)
that were observed, but contained no changes.
- New
epix_merge
,$merge
:- New
sync
parameter controls what to do ifx
andy
aren't equally up to date (i.e., ifx$versions_end
andy$versions_end
are different).
- New
- New function
epix_fill_through_version
, method<epi_archive>$fill_through_version
: non-mutating & mutating way to ensure that an archive contains versions at least through somefill_versions_end
, extrapolating according tohow
if necessary. - Example archive data object is now constructed on demand from its
underlying data, so it will be based on the user's version of
epi_archive
rather than an outdated R6 implementation from whenever the data object was generated.
- Removed default
n=7
argument toepix_slide
.
- Ignore
NA
s when printingtime_value
range for anepi_archive
. - Fixed misleading column naming in
epix_slide
example. - Trimmed down
epi_slide
examples. - Synced out-of-date docs.
- Removed dependency of some
epi_archive
tests on an example archive. object, and made them more understandable by reading without running. - Fixed
epi_df
tests relying on an S3 method forepi_df
implemented externally toepiprocess
. - Added tests for
epi_archive
methods and wrapper functions. - Removed some dead code.
- Made
.{Rbuild,git}ignore
files more comprehensive.
- New
new_epi_df
function is similar toas_epi_df
, but (i) recalculates, overwrites, and/or drops most metadata ofx
if it has any, (ii) may still reorder the columns ofx
even if it's already anepi_df
, and (iii) treatsx
as optional, constructing an emptyepi_df
by default.
- Fixed
geo_type
guessing on alphabetical strings with more than 2 characters to yield"custom"
, not US"nation"
. - Fixed
time_type
guessing to actually detectDate
-classtime_value
s regularly spaced 7 days apart as"week"
-type as intended. - Improved printing of
epi_df
s,epi_archives
s. - Fixed
as_of
to not cut off any (forecast-like) data withtime_value > max_version
. - Expanded
epi_df
docs to include conversion fromtsibble
/tbl_ts
objects, usage ofother_keys
, and pre-processing objects not following thegeo_value
,time_value
naming scheme. - Expanded
epi_slide
examples to show how to use anf
argument with named parameters. - Updated examples to print relevant columns given a common 80-column terminal width.
- Added growth rate examples.
- Improved
as_epi_archive
andepi_archive$new
/$initialize
documentation, including constructing a toy archive.
- Added tests for
epi_slide
,epi_cor
, and internal utility functions. - Fixed currently-unused internal utility functions
MiddleL
,MiddleR
to yield correct results on odd-length vectors.
- New example data objects allow one to quickly experiment with
epi_df
s andepi_archives
without relying/waiting on an API to fetch data.
- Improved
epi_slide
error messaging. - Fixed description of the appropriate parameters for an
f
argument toepi_slide
; previous description would give incorrect behavior iff
had named parameters that did not receive values fromepi_slide
's...
. - Added some examples throughout the package.
- Using example data objects in vignettes also speeds up vignette compilation.
- Set up gh-actions CI.
- Added tests for
epi_df
s.
- Classes:
epi_df
: specializedtbl_df
for geotemporal epidemiological time series data, with optional metadata recording other key columns (e.g., demographic breakdowns) andas_of
what time/version this data was current/published. Associated functions:as_epi_df
converts to anepi_df
, guessing thegeo_type
,time_type
,other_keys
, andas_of
if not specified.as_epi_df.tbl_ts
andas_tsibble.epi_df
automatically setother_keys
andkey
&index
, respectively.epi_slide
applies a user-supplied computation to a sliding/rolling time window and user-specified groups, adding the results as new columns, and recycling/broadcasting results to keep the result size stable. Allows computation to be provided as a function,purrr
-style formula, or tidyeval dots. Usesslider
underneath for efficiency.epi_cor
calculates Pearson, Kendall, or Spearman correlations between two (optionally time-shifted) variables in anepi_df
within user-specified groups.- Convenience function:
is_epi_df
.
epi_archive
: R6 class for version (patch) data for geotemporal epidemiological time series data sets. Comes with S3 methods and regular functions that wrap around this functionality for those unfamiliar with R6 methods. Associated functions:as_epi_archive
: prepares anepi_archive
object from a data frame containing snapshots and/or patch data for every available version of the data set.as_of
: extracts a snapshot of the data set as of some requested version, inepi_df
format.epix_slide
,<epi_archive>$slide
: similar toepi_slide
, but forepi_archive
s; for each requestedref_time_value
and group, applies a time window and user-specified computation to a snapshot of the data as ofref_time_value
.epix_merge
,<epi_archive>$merge
: likemerge
forepi_archive
s, but allowing for the last version of each observation to be carried forward to fill in gaps inx
ory
.- Convenience function:
is_epi_archive
.
- Additional functions:
growth_rate
: estimates growth rate of a time series using one of a few built-inmethod
s based on relative change, linear regression, smoothing splines, or trend filtering.detect_outlr
: applies one or more outlier detection methods to a given signal variable, and optionally aggregates the outputs to create a consensus result.detect_outlr_rm
: outlier detection function based on a rolling-median-based outlier detection function; one of the methods included indetect_outlr
.detect_outlr_stl
: outlier detection function based on a seasonal-trend decomposition using LOESS (STL); one of the methods included indetect_outlr
.