This repo contains source code for creating Databricks Workspace on Azure & AWS. It also uses python-sdk
to deploy legacy hive resources on your workspace.
- Make sure you have
Account Admin
privileges on both your Cloud (AWS/Azure) & Databricks Account Console. - Get in touch with your cloud administrator for any elevated privilege you might need.
- You may use this utility for deploying either both an Azure Workspace & Hive resources or in case you already have a workspace deployed, you may still use this utility to deploy just legacy hive resources on your Azure Databricks workspace.
- To run this utility you need to have following installed on your local machine.
- Once you take care of all the prerequisites mentioned above, clone this repo to your local machine.
- Go to the folder
./ucx-bootcamp
- Run the command :
python3 deploy_ws_resources.py
- Azure Tenant ID. The script uses azure-cli to authenticate using your tenant ID. Once authenticated successfully, you can select the subscription of your choice where you are planning to deploy your workspace resources.
- Azure Region. Default : centralus
- Deployment keyword identifier. This string will be present in your workspace name to uniquely identify your resources deployed on Azure. Default : ucxbootcamp
- Once the Workspace deployment is done, the utility moves to the second section where it asks for details to deploy legacy hive resources on your workspace
- If you already have a workspace deployed, you can have this script move to this second section directly by
providing
no
when askedDo you want to deploy a Workspace on Azure?[yes/no]
by the script.
- Your Email ID (username) that you use to login to Databricks.
- Azure Databricks Workspace URL.
- A service principal client-id and corresponding client-secret. The SDK will use this SP to authenticate
to your Workspace and create hive resources. make sure your SP has the following privileges.
- Cluster creation permission.
- Your SP must be a member of Workspace Admins system group.
- Your SP should have SELECT & MODIFY Grants to ANY FILE. You may use the below SQL query from a notebook
to grant this permission to your SP.
GRANT SELECT, MODIFY ON ANY FILE TO `your-client-id`;
- The script will also create hive external tables on ADLS. Hence, make sure you already have a
storage account and a container created. The Script expects you to provide a
storage-account
name & a completeabfss
path where you plan to store the data for your hive external tables. - The Databricks Notebook runtime context requires an
fs-azure-key
to gain access to your Azure container for creating external tables. You need to create one for your Azure storage account and pass that as input to the script. The script will create a Databricks-managed Secret Scope on your workspace to store the secret value. You are expected to givescope-name
,secret-key
&secret-value (fs azure key value)
as input.
This script creates the following resources:
- Secret Scope
- A General Purpose Cluster
- A SQL Warehouse
- Hive tables
- Managed (DBFS root)
- External (on ADLS gen2)
- Streaming Managed tables
- External Materialized Views.
- Workspace-level local groups
- Grants on Catalog, Schemas, and Tables assigned to workspace groups.