You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When doing continuous rollup, the ingested data may be delayed/out of order.
Currently, the delay in rollup is the only solution to handle this problem. The delay acts on the field on which rollup does date_histogram.
However, delay is a fixed value which cannot handle dynamic delay or more complicated scenario.
For example, there could be multiple data sources ingested into the index being rollup continuously, one data source ingestion may be up-to-date but the other ingestion may fall behind dynamically due to variety of issues. So even with a delay defined, our current rollup is still not able to handle this case.
Proposed solution:
During ingestion, user adds a field that records the actual ingested timestamp - ingested_at
Enhance rollup to be able to act on a range of ingested_at time, for example, doing continuous rollup on every hour of ingested_at time.
The first composite search query rollup does will be like
The result of this query shows the buckets (b_new) for these ingested time range, and then there or at least 2 ways to combine the results of b_new with existing rollup data.
another composite search performed only on b_new, effectively, re-rollup the updated buckets. (personally preferred)
retrieve the rollup data of b_new if exists, then combine them together
The text was updated successfully, but these errors were encountered:
When doing continuous rollup, the ingested data may be delayed/out of order.
Currently, the delay in rollup is the only solution to handle this problem. The delay acts on the field on which rollup does date_histogram.
However, delay is a fixed value which cannot handle dynamic delay or more complicated scenario.
For example, there could be multiple data sources ingested into the index being rollup continuously, one data source ingestion may be up-to-date but the other ingestion may fall behind dynamically due to variety of issues. So even with a delay defined, our current rollup is still not able to handle this case.
Proposed solution:
During ingestion, user adds a field that records the actual ingested timestamp -
ingested_at
Enhance rollup to be able to act on a range of
ingested_at
time, for example, doing continuous rollup on every hour ofingested_at
time.The first composite search query rollup does will be like
The result of this query shows the buckets (
b_new
) for these ingested time range, and then there or at least 2 ways to combine the results ofb_new
with existing rollup data.b_new
, effectively, re-rollup the updated buckets. (personally preferred)b_new
if exists, then combine them togetherThe text was updated successfully, but these errors were encountered: