Skip to content

Commit

Permalink
optimize: speed up stat gen by factor x15 (#167)
Browse files Browse the repository at this point in the history
Made the stat generation faster using Rayon's thread pool.

Various improvements, such as;
- slightly less copying
- optimized MG
- process in parallel using Rayon, taking advantage of fold/reduce
- moved to Parquet (WIP to convert into Parquet automatically)
- read Parquet in parallel (one reader for each row-group), this
granularity is sufficient for big enough datasets

This takes 30s for JOB 1D stats on my computer, vs 7:30min before.
Postgres takes 1:30min loading, and 22s for the stat gen. So we beat it,
depending on how we view it.

On "real" datacenter hardware (i.e. 512 cores), we would **crush** it,
we'll test that soon.

Finally coming together :-)
  • Loading branch information
AlSchlo authored Apr 30, 2024
1 parent 5528eec commit 74dc3ff
Show file tree
Hide file tree
Showing 11 changed files with 411 additions and 244 deletions.
35 changes: 30 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions optd-datafusion-repr/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ assert_approx_eq = "1.1.0"
serde = { version = "1.0", features = ["derive"] }
serde_with = {version = "3.7.0", features = ["json"]}
bincode = "1.3.3"
rayon = "1.10"
union-find = { git = "https://github.com/Gun9niR/union-find-rs.git", rev = "794821514f7daefcbb8d5f38ef04e62fc18b5665" }
test-case = "3.3"
chrono = "0.4"
Loading

0 comments on commit 74dc3ff

Please sign in to comment.