Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential optimisations for query evaluation #5

Open
desmondcheongzx opened this issue Mar 26, 2021 · 3 comments
Open

Potential optimisations for query evaluation #5

desmondcheongzx opened this issue Mar 26, 2021 · 3 comments

Comments

@desmondcheongzx
Copy link
Collaborator

Currently we perform unions and intersections over individual records. This came as a result of needing to filter records by metrics/timestamps at the leaf level. However, this is potentially costly when we're still filtering results by labelKey and labelValue pairs.

Instead, our ResultSet could have two additional fields: a bitset field containing the relevant series in a roaring bitmap, and unpacked boolean, denoting whether the roaring bitmap has been unpacked into the vector of records.

Two ResultSets that haven't been unpacked can be unioned/intersected on their bitsets alone. When a ResultSet has been unpacked, we must unpack any other ResultSet that it is unioned/intersected with. Finally, before returning our results, we must ensure that the ResultSet has been unpacked.

@n-young
Copy link
Owner

n-young commented Mar 28, 2021

Also, could apply the entire conditional to each series rather than iterating many times over

@desmondcheongzx
Copy link
Collaborator Author

Right, so there should be a third field called filters that would store a vector of lambda functions to apply over the data points. When evaluating a variable + metric value predicate, we get the bitmap of relevant series plus the filter to apply.

We can delay applying this conditional as long as the resultSet is involved in AND operations. Once there's an OR operation, we have no choice but to unpack both conditions.

@desmondcheongzx
Copy link
Collaborator Author

It's hard to evaluate without a proper benchmark, but testing query evaluation on larger data sets is still very very slow even with #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants