ucx-bootcamp

Utilities for UCX Bootcamp

This repo contains source code for creating Databricks Workspace on Azure & AWS. It also uses python-sdk to deploy legacy hive resources on your workspace.

Bootcamp Guidebook

Prerequisites

Make sure you have Account Admin privileges on both your Cloud (AWS/Azure) & Databricks Account Console.
1. Databricks Account Console on Azure
2. Databricks Account Console on AWS
Get in touch with your cloud administrator for any elevated privilege you might need.
You may use this utility for deploying either both an Azure Workspace & Hive resources or in case you already have a workspace deployed, you may still use this utility to deploy just legacy hive resources on your Azure Databricks workspace.
To run this utility you need to have following installed on your local machine.

How to run this utility on your localhost

Once you take care of all the prerequisites mentioned above, clone this repo to your local machine.
Go to the folder ./ucx-bootcamp
Run the command : python3 deploy_ws_resources.py

Inputs needed by this script for deploying a workspace on Azure:

Azure Tenant ID. The script uses azure-cli to authenticate using your tenant ID. Once authenticated successfully, you can select the subscription of your choice where you are planning to deploy your workspace resources.
Azure Region. Default : centralus
Deployment keyword identifier. This string will be present in your workspace name to uniquely identify your resources deployed on Azure. Default : ucxbootcamp

Once the Workspace deployment is done, the utility moves to the second section where it asks for details to deploy legacy hive resources on your workspace
If you already have a workspace deployed, you can have this script move to this second section directly by providing no when asked Do you want to deploy a Workspace on Azure?[yes/no] by the script.

Inputs needed by this script for deploying Hive resources on your Azure Databricks:

Your Email ID (username) that you use to login to Databricks.
Azure Databricks Workspace URL.
A service principal client-id and corresponding client-secret. The SDK will use this SP to authenticate to your Workspace and create hive resources. make sure your SP has the following privileges.
1. Cluster creation permission.
2. Your SP must be a member of Workspace Admins system group.
3. Your SP should have SELECT & MODIFY Grants to ANY FILE. You may use the below SQL query from a notebook to grant this permission to your SP. GRANT SELECT, MODIFY ON ANY FILE TO `your-client-id`;
The script will also create hive external tables on ADLS. Hence, make sure you already have a storage account and a container created. The Script expects you to provide a storage-account name & a complete abfss path where you plan to store the data for your hive external tables.
The Databricks Notebook runtime context requires an fs-azure-key to gain access to your Azure container for creating external tables. You need to create one for your Azure storage account and pass that as input to the script. The script will create a Databricks-managed Secret Scope on your workspace to store the secret value. You are expected to give scope-name, secret-key & secret-value (fs azure key value) as input.

Legacy Hive Resources

This script creates the following resources:

Secret Scope
A General Purpose Cluster
A SQL Warehouse
Hive tables
1. Managed (DBFS root)
2. External (on ADLS gen2)
3. Streaming Managed tables
4. External Materialized Views.
Workspace-level local groups
Grants on Catalog, Schemas, and Tables assigned to workspace groups.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
config		config
deploy-ws		deploy-ws
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
deploy_init.py		deploy_init.py
deploy_ws_resources.py		deploy_ws_resources.py
sql_commands.py		sql_commands.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ucx-bootcamp

Utilities for UCX Bootcamp

Prerequisites

How to run this utility on your localhost

Inputs needed by this script for deploying a workspace on Azure:

Inputs needed by this script for deploying Hive resources on your Azure Databricks:

Legacy Hive Resources

Notebooks used for creating pipelines

About

Releases

Packages

Languages

biswadeepupadhyay-db/ucx-bootcamp

Folders and files

Latest commit

History

Repository files navigation

ucx-bootcamp

Utilities for UCX Bootcamp

Prerequisites

How to run this utility on your localhost

Inputs needed by this script for deploying a workspace on Azure:

Inputs needed by this script for deploying Hive resources on your Azure Databricks:

Legacy Hive Resources

Notebooks used for creating pipelines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages