daft-launcher
is a simple launcher for spinning up and managing Ray clusters for daft
.
For a deeper introduction, please refer to our documentation.
Getting started with Daft in a local environment is easy. However, getting started with Daft in a cloud environment is substantially more difficult. So much more difficult, in fact, that users end up spending more time setting up their environment than actually playing with our query engine.
Daft Launcher aims to solve this problem by providing a simple CLI tool to remove all of this unnecessary heavy-lifting.
What Daft Launcher is capable of:
- Spinning up clusters.
- Listing all available clusters (as well as their statuses).
- Submitting jobs to a cluster.
- Connecting to the cluster (to view the Ray dashboard and submit jobs using the Ray protocol).
- Spinning down clusters.
- Creating configuration files.
- Running raw SQL statements using Daft's SQL API.
- AWS
- GCP
- Azure
You'll need some python package manager installed.
We recommend using uv
for all things python.
If you're using AWS, you'll need:
- A valid AWS account with the necessary IAM role to spin up EC2 instances. This IAM role can either be created by you (assuming you have the appropriate permissions). Or this IAM role will need to be created by your administrator.
- The AWS CLI installed and configured on your machine.
- To login using the AWS CLI. For full instructions, please look here.
Using uv
:
# create project
mkdir my-project
cd my-project
# initialize project and setup virtual env
uv init
uv venv
source .venv/bin/activate
# install launcher
uv pip install daft-launcher
All interactions with Daft Launcher are primarily communicated via a configuration file.
By default, Daft Launcher will look inside your $CWD
for a file named .daft.toml
.
You can override this behaviour by specifying a custom configuration file.
# create a new configuration file
# will create a file named `.daft.toml` in the current working directory
daft init-config
# or optionally, pass in a custom name
daft init-config my-custom-config.toml
# spin up a cluster
daft up
# or optionally, pass in a custom config file
daft up -c my-custom-config.toml
# list all the active clusters (can have multiple clusters running at the same time)
daft list
# submit a directory and a command to run on the cluster
daft submit --working-dir <...> -- command arg1 arg2 ...
# or optionally, pass in a custom config file
daft submit -c my-custom-config.toml --working-dir $WORKING_DIR -- command arg1 arg2 ...
# run a direct SQL query against the daft query engine running in the remote cluster
daft sql -- "SELECT * FROM my_table WHERE column = 'value'"
# or optionally, pass in a custom config file
daft sql -c my-custom-config.toml -- "SELECT * FROM my_table WHERE column = 'value'"
# spin down a cluster
daft down
# or optionally, pass in a custom name
daft down -c my-custom-config.toml