Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Compute Pool 2 #3076

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

[FEAT] Compute Pool 2 #3076

wants to merge 13 commits into from

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Oct 18, 2024

Add a global tokio runtime for compute . Follow on from #2986

@github-actions github-actions bot added the enhancement New feature or request label Oct 18, 2024
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 82.92108% with 145 lines in your changes missing coverage. Please review.

Project coverage is 78.38%. Comparing base (e4c6f3f) to head (85d32d2).

Files with missing lines Patch % Lines
src/daft-local-execution/src/sources/scan_task.rs 0.00% 39 Missing ⚠️
src/daft-parquet/src/stream_reader.rs 84.61% 30 Missing ⚠️
src/daft-parquet/src/file.rs 84.14% 26 Missing ⚠️
src/daft-local-execution/src/run.rs 61.29% 12 Missing ⚠️
src/common/runtime/src/io.rs 82.45% 10 Missing ⚠️
src/daft-json/src/local.rs 89.36% 10 Missing ⚠️
src/common/runtime/src/compute.rs 77.50% 9 Missing ⚠️
...-execution/src/intermediate_ops/intermediate_op.rs 66.66% 3 Missing ⚠️
src/common/runtime/src/lib.rs 96.29% 2 Missing ⚠️
src/daft-csv/src/metadata.rs 50.00% 1 Missing ⚠️
... and 3 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3076      +/-   ##
==========================================
+ Coverage   78.34%   78.38%   +0.03%     
==========================================
  Files         611      614       +3     
  Lines       72455    72574     +119     
==========================================
+ Hits        56767    56884     +117     
- Misses      15688    15690       +2     
Files with missing lines Coverage Δ
daft/execution/native_executor.py 85.71% <100.00%> (+0.71%) ⬆️
src/daft-csv/src/read.rs 99.34% <100.00%> (+<0.01%) ⬆️
src/daft-functions/src/tokenize/bpe.rs 96.35% <100.00%> (ø)
src/daft-functions/src/uri/download.rs 83.46% <100.00%> (ø)
src/daft-functions/src/uri/upload.rs 64.63% <100.00%> (ø)
src/daft-io/src/lib.rs 72.46% <ø> (-1.72%) ⬇️
src/daft-io/src/s3_like.rs 66.52% <ø> (+0.10%) ⬆️
src/daft-json/src/lib.rs 80.00% <ø> (ø)
src/daft-json/src/read.rs 95.56% <100.00%> (+0.13%) ⬆️
...-local-execution/src/intermediate_ops/aggregate.rs 100.00% <ø> (ø)
... and 24 more

... and 1 file with indirect coverage changes

colin-ho added a commit that referenced this pull request Oct 23, 2024
Create a multithreaded compute runtime for swordfish compute tasks.
Switch query runtime to be single threaded, and use IO pool for scan
task streams.

Additionally, adds in a `tokio_select` together with the
`tokio::signal::ctrlc` and main async execution loop so that queries can
be cancelled.

```
import os
import daft
import numpy
import time
import psutil

current_process = psutil.Process(os.getpid())

daft.set_execution_config(enable_native_executor=True, default_morsel_size=1)
dfs = [
    iter(
        daft.from_pydict({"a": numpy.random.rand(10)}).with_column(
            "plus_one", daft.col("a") + 1
        )
    )
    for _ in range(10)
]
while True:
    for i, df in enumerate(dfs):
        time.sleep(0.1)
        try:
            print("threads: ", current_process.num_threads())
            print(next(df))
        except StopIteration:
            dfs.pop(i)
    if not dfs:
        break
```
If you run this script you can see that the number of threads increases
by only 1 per dataframe.

TODO:
- replace rayon with this ->
#3076

---------

Co-authored-by: Colin Ho <[email protected]>
Co-authored-by: Colin Ho <[email protected]>
Co-authored-by: Colin Ho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant