Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sagemaker-studio module #6

Merged
merged 7 commits into from
Feb 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Change Log

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

=======

## UNRELEASED

### **Added**

- added `sagemaker-studio` module
- added `sagemaker-endpoint` module

### **Changed**

### **Removed**

=======
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ All modules in this repository adhere to the module strutucture defined in the t
| Type | Description |
|-----------------------------------------------------------------------------|-------------------------------------------------|
| [SageMaker Endpoint Module](modules/sagemaker/sagemaker-endpoint/README.md) | Creates SageMaker real-time inference endpoint. |
| [SageMaker Studio Module](modules/sagemaker/sagemaker-studio/README.md) | Creates SageMaker Studio Domain. |


### Industry Data Framework (IDF) Modules

Expand Down
2 changes: 1 addition & 1 deletion manifests/sagemaker-studio-modules.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: studio
path: git::https://github.com/awslabs/idf-modules.git//modules/ml/sagemaker-studio?ref=release/1.3.0&depth=1
path: modules/sagemaker/sagemaker-studio

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok for now, but after the first release, please change this to a gitpath on the version so that users don't have to clone the entire repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes, will lock all module versions after the release is cut

targetAccount: primary
parameters:
- name: studio_domain_name
Expand Down
129 changes: 129 additions & 0 deletions modules/sagemaker/sagemaker-studio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# SageMaker Studio Infrastructure

This module contains the resources that are required to deploy the SageMaker Studio infrastructure. It defines the setup for Amazon SageMaker Studio Domain and creates SageMaker Studio User Profiles for Data Scientists and Lead Data Scientists.

**NOTE** To effectively use this repository you would need to have a good understanding around AWS networking services, AWS CloudFormation and AWS CDK.
- [SageMaker studio Infrastructure](#sagemaker-studio-infrastructure)
- [SageMaker Studio Stack](#sagemaker-studio-stack)
- [Inputs and outputs:](#inputs-and-outputs)
- [Required inputs:](#required-inputs)
- [Optional Inputs:](#optional-inputs)
- [Outputs (module metadata):](#outputs-module-metadata)
- [Example Output:](#example-output)
- [Getting Started](#getting-started)
- [Prerequisites](#prerequisites)
- [Module Structure](#module-structure)
- [Troubleshooting](#troubleshooting)

### Architecture

![SageMaker Studio Module Architecture](docs/_static/sagemaker-studio-module-architecture.png "SageMaker Studio Module Architecture")

This module handles the deployment of the following resources:

1. SageMaker Studio Domain requires, along with
2. IAM roles which would be linked to SM Studio user profiles. User Profile creating process is managed by manifests files in `manifests/shared-infra/mlops-modules.yaml`. You can simply add new entries in the list to create a new user. The user will be linked to a role depending on which group you add them to (`data_science_users` or `lead_data_science_users`).

```
- name: data_science_users
value:
- data-scientist
- name: lead_data_science_users
value:
- lead-data-scientist
```

3. Default SageMaker Project Templates are also enabled on the account on the targeted region using a custom resource; the custom resource uses a lambda function, `functions/sm_studio/enable_sm_projects`, to make necessary SDK calls to both Amazon Service Catalog and Amazon SageMaker.

## Inputs and outputs:
### Required inputs:
- `vpc_id` - the VPC id that the SageMaker Studio Domain will be created in
- `subnet_ids` - the subnets that the SageMaker Studio Domai will be created in
### Optional Inputs:
- `studio_domain_name` - name of the SageMaker Studio Domain
- `studio_bucket_name` - name of the bucket used by studio
- `app_image_config_name` - custom kernel app config name
- `image_name` - custom kernel image name
- `data_science_users` - a list of data science user names to create
- `lead_data_science_users` - a list of lead data science user names to create
- `retain_efs` - True | False -- if set to True, the EFS volume will persist after domain deletion. Default is True
- `enable_custom_sagemaker_projects` - True | False -- if set to True, custom sagemaker projects will be enabled for the data science and lead data science users. Default is False

### Outputs (module metadata):
- `StudioDomainName` - the name of the domain created by Sagemaker Studio
- `StudioDomainId` - the Id of the domain created by Sagemaker Studio
- `StudioBucketName` - the Bucket (or prefix) given access to Sagemaker Studio
- `StudioDomainEFSId` - the EFS created by Sagemaker Studio
- `DataScientistRoleArn` - ARN of the Data Scientist IAM role
- `LeadDataScientistRoleArn` - ARN of the Lead Data Scientist IAM role
- `SageMakerExecutionRoleArn` - ARN of the SageMaker execution IAM role

### Example Output:
```yaml
{
"DataScientistRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/mlops-sagemaker-sage-smrolesdatascientistrole-DYPIVQ6NUSP9",
"LeadDataScientistRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/mlops-sagemaker-sage-smrolesleaddatascientist-V1YL0FQONH62",
"SageMakerExecutionRoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/mlops-sagemaker-sage-smrolessagemakerstudioro-F6HGOUX0JGTI",
"StudioBucketName": "mlops-*",
"StudioDomainEFSId": "fs-0a550ea71ecac4978",
"StudioDomainId": "d-flfqmvy84hfq",
"StudioDomainName": "mlops-sagemaker-sagemaker-sagemaker-studio-studio-domain"
}
```

## Getting Started

### Prerequisites

This is an AWS CDK project written in Python 3.8. Here's what you need to have on your workstation before you can deploy this project. It is preferred to use a linux OS to be able to run all cli commands and avoid path issues.

* [Node.js](https://nodejs.org/)
* [Python3.8](https://www.python.org/downloads/release/python-380/) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html)
* [AWS CDK v2](https://aws.amazon.com/cdk/)
* [AWS CLI](https://aws.amazon.com/cli/)
* [Docker](https://docs.docker.com/desktop/)

### Module Structure
kukushking marked this conversation as resolved.
Show resolved Hide resolved

```
├── functions <--- lambda functions and layers
│ └── sm_studio <--- sagemaker studio stack related lambda function
│ └── enable_sm_projects <--- lambda function to enable sagemaker projects on the account and links the IAM roles of the domain users (used as a custom resource)
├── helper constructs <--- helper CDK constructs
│ └── sm_roles.py <--- helper construct containing IAM roles for sagemaker studio users
├── scripts <--- helper scripts
│ └── check_lcc_state.sh <--- script to check if sagemaker studio lifecycle config needs an update
│ └── delete-domains.py <--- python helper script to delete sagemaker domains
│ └── delete_efs.py <--- python helper script to delete efs mounts
│ └── on-jupyter-server-start.sh <--- script that installs the idle notebook auto-checker jupyter server extension
├── tests <--- module unit tests
├── app.py <--- cdk application entrypoint
├── coverage.ini <--- test coverage tool parameters file
├── deployspec.yaml <--- file that defines deployment instructions
├── modulestack.yaml <--- cloudformation stack that contains permissions needed to deploy the module
├── pyproject.toml <--- build system requirements and settings file
├── README.md <--- module documentation markdown file
├── requirements.txt <--- cdk packages used in the stacks (must be installed)
├── stack.py <--- stack to create sagemaker studio domain along with related IAM roles and the domain users
├── update-domain-input.template.json <--- json template to update sagemaker domain lifecycle configs
```
## Troubleshooting


* **Resource being used by another resource**

This error is harder to track and would require some effort to trace where is the resource that we want to delete is being used and severe that dependency before running the destroy command again.

**NOTE** You should just really follow CloudFormation error messages and debug from there as they would include details about which resource is causing the error and in some occasion information into what needs to happen in order to resolve it.


* **CDK version X instead of Y**

This error relates to a new update to cdk so run `npm install -g aws-cdk` again to update your cdk to the latest version and then run the deployment step again for each account that your stacks are deployed.

* **`cdk synth`** **not running**

One of the following would solve the problem:

* Docker is having an issue so restart your docker daemon
* Refresh your awscli credentials
83 changes: 83 additions & 0 deletions modules/sagemaker/sagemaker-studio/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

import json
import os
from typing import cast

import aws_cdk
from aws_cdk import CfnOutput

from stack import SagemakerStudioStack

project_name = os.getenv("SEEDFARMER_PROJECT_NAME", "")
deployment_name = os.getenv("SEEDFARMER_DEPLOYMENT_NAME", "")
module_name = os.getenv("SEEDFARMER_MODULE_NAME", "")
app_prefix = f"{project_name}-{deployment_name}-{module_name}"

DEFAULT_STUDIO_DOMAIN_NAME = f"{app_prefix}-studio-domain"
DEFAULT_STUDIO_BUCKET_NAME = f"{app_prefix}-bucket"
DEFAULT_CUSTOM_KERNEL_APP_CONFIG_NAME = None
DEFAULT_CUSTOM_KERNEL_IMAGE_NAME = None
DEFAULT_ENABLE_CUSTOM_SAGEMAKER_PROJECTS = False


def _param(name: str) -> str:
return f"SEEDFARMER_PARAMETER_{name}"


vpc_id = os.getenv(_param("VPC_ID"))
subnet_ids = json.loads(os.getenv(_param("SUBNET_IDS"), "[]"))
studio_domain_name = os.getenv(_param("STUDIO_DOMAIN_NAME"), DEFAULT_STUDIO_DOMAIN_NAME)
studio_bucket_name = os.getenv(_param("STUDIO_BUCKET_NAME"), DEFAULT_STUDIO_BUCKET_NAME)
app_image_config_name = os.getenv(_param("CUSTOM_KERNEL_APP_CONFIG_NAME"), DEFAULT_CUSTOM_KERNEL_APP_CONFIG_NAME)
image_name = os.getenv(_param("CUSTOM_KERNEL_IMAGE_NAME"), DEFAULT_CUSTOM_KERNEL_IMAGE_NAME)
enable_custom_sagemaker_projects = bool(
os.getenv(_param("ENABLE_CUSTOM_SAGEMAKER_PROJECTS"), DEFAULT_ENABLE_CUSTOM_SAGEMAKER_PROJECTS)
)

environment = aws_cdk.Environment(
account=os.environ["CDK_DEFAULT_ACCOUNT"],
region=os.environ["CDK_DEFAULT_REGION"],
)

data_science_users = json.loads(os.getenv(_param("DATA_SCIENCE_USERS"), "[]"))
lead_data_science_users = json.loads(os.getenv(_param("LEAD_DATA_SCIENCE_USERS"), "[]"))

app = aws_cdk.App()
stack = SagemakerStudioStack(
app,
app_prefix,
project_name=project_name,
deployment_name=deployment_name,
module_name=module_name,
vpc_id=cast(str, vpc_id),
subnet_ids=subnet_ids,
studio_domain_name=studio_domain_name,
studio_bucket_name=studio_bucket_name,
data_science_users=data_science_users,
lead_data_science_users=lead_data_science_users,
env=environment,
app_image_config_name=cast(str, app_image_config_name),
image_name=cast(str, image_name),
enable_custom_sagemaker_projects=enable_custom_sagemaker_projects,
)


CfnOutput(
scope=stack,
id="metadata",
value=stack.to_json_string(
{
"StudioDomainName": stack.studio_domain.domain_name,
"StudioDomainEFSId": stack.studio_domain.attr_home_efs_file_system_id,
"StudioDomainId": stack.studio_domain.attr_domain_id,
"StudioBucketName": studio_bucket_name,
"DataScientistRoleArn": stack.sm_roles.data_scientist_role.role_arn,
"LeadDataScientistRoleArn": stack.sm_roles.lead_data_scientist_role.role_arn,
"SageMakerExecutionRoleArn": stack.sm_roles.sagemaker_studio_role.role_arn,
}
),
)

app.synth()
3 changes: 3 additions & 0 deletions modules/sagemaker/sagemaker-studio/coverage.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[run]
omit =
tests/*
46 changes: 46 additions & 0 deletions modules/sagemaker/sagemaker-studio/deployspec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
publishGenericEnvVariables: True
deploy:
phases:
install:
commands:
- npm install -g [email protected]
- pip install -r requirements.txt
- apt-get install gettext-base
build:
commands:
- LCC_CONTENT=`openssl base64 -A -in scripts/on-jupyter-server-start.sh`
- export LCC_CONTENT=$LCC_CONTENT
- aws sagemaker create-studio-lifecycle-config --studio-lifecycle-config-name $SEEDFARMER_PARAMETER_SERVER_LIFECYCLE_NAME --studio-lifecycle-config-content $LCC_CONTENT --studio-lifecycle-config-app-type JupyterServer || true
- export LCC_ARN=$(aws sagemaker describe-studio-lifecycle-config --studio-lifecycle-config-name $SEEDFARMER_PARAMETER_SERVER_LIFECYCLE_NAME | jq -r ."StudioLifecycleConfigArn")
- echo $LCC_ARN
- ./scripts/check_lcc_state.sh
- cdk deploy --require-approval never --progress events --app "python app.py" --outputs-file ./cdk-exports.json
- cat cdk-exports.json
# Export metadata
- seedfarmer metadata convert -f cdk-exports.json || true
- export SEEDFARMER_MODULE_METADATA=$(cat SEEDFARMER_MODULE_METADATA)
- export DOMAIN_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainId")
- echo $DOMAIN_ID
# Update SageMaker domain lifecycle config
- envsubst < "update-domain-input.template.json" > "update-domain-input.json"
- aws sagemaker update-domain --cli-input-json file://update-domain-input.json
destroy:
phases:
install:
commands:
- npm install -g [email protected]
- pip install -r requirements.txt
build:
commands:
- cdk destroy --force --app "python app.py"
- export EFS_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainEFSId")
- export DOMAIN_ID=$(echo ${SEEDFARMER_MODULE_METADATA} | jq -r ".StudioDomainId")
- RETAIN_EFS=$(echo $SEEDFARMER_PARAMETER_RETAIN_EFS | tr '[:lower:]' '[:upper:]')
- echo $RETAIN_EFS
- echo $EFS_ID
- echo $DOMAIN_ID
- >
if [[ $RETAIN_EFS == "FALSE" ]]; then
echo "DELETING EFS"
python scripts/delete_efs.py ${EFS_ID} ${DOMAIN_ID} || true
fi;
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile modified="2024-02-20T11:58:32.984Z" host="design-inspector.a2z.com" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36" etag="eZMb914vTjjbKFrjHy0S" version="10.1.8" type="device"><diagram id="FEMELHWA8N4NcK1pcfr63" name="Page-1">7VzbktsoEP0aP+ISAt0efcnsTlWylcpsspunLYSQrB1ZuBAe2/n6bXTxSLbmVmPHdtbJVCIaRDfQ9Dm08QzIZL7+TbHF7JOMRDawrWg9INOBbQcuhX+NYFMJXKcWJCqNKhF+FNylP0QttGrpMo1E0Wmopcx0uugKucxzwXVHxpSSq26zWGZdrQuWiD3BHWfZvvSvNNKzSurb3qP8d5Ems0YzdoOqZs6axvVIihmL5KolIh8GZKKk1NXTfD0RmZm7Zl7q93Kx1qbmNvrGsmVjFtn28EUUcqm4mIqCq3ShpYKXVC2sWv8zIKPGCqFSlqU/mE5ljh6EKuD/qtVD3YTVU6Z6Oq513ok5y3XKp0yzicw1S3OhXtN79bZWaZ58TLVQLKvWTotcd0a9UHIhlK69Zqa1We/RwL6BH2guM5lshoXgS5XqzZDN2Q+ZDyPxANWxXOZRaQEUopQlis3RQ1ost5aBnNm24wSOiziJYkS5zVFI/QCKlhe7Pgt5UC3xTWXz7ZfbvWl9k1XgxWmSozQvFuCnZi5vuJwvZA4jL6DgU+ZboRMjx6U2ogx7KHB8B4kwjkLXiWko+GGnptgUWszR3GxZWA+QWNQhgRN4iNo+QTR2HeRbgYWYcL04iIgIaNCeFHjo94Smtsc3m6ra3ftdf7vrVNcv3uLw+EIcHl8d/n/n8DL814CVbWUsBLwsxzWw3QxcYhylYLibmMdGFDaCb58njQx6DHfblcvferv0f/A0nmZ/bhb1Ij4s6mltNtt3KpLg+4YEbJZsPk498jdVX5C99d7t7iz0poHFhUzNKsKzM4YfcPWJNXCgZmJKQ9vZEeyWva4A75dMH13BbtnrCvBu93hHP941sCXYK3W6t3b0Wy0D4YeM5VKDR4nJloRYIIRtGKXg7BNwUHCDaQ6uD+KZnpsVx/C4mkFIuFuwcn+tgD+BLAafqmkQtptyPfGmV6AR5UrO14khXEO2KmxrmCi5XJQ6b3kZgHqq/zErDx1oJe9FY9TAJjb1fUyNqjTLdoyFiAa+zrJRBnsZxFoaLawuZSI2jlbACEx8K0tTYtVWt1SMRmNv7BvnZsVMRPVQKjc0KsT6ybhfi1QUf12YGRVdiGBFIbRx7m/t2EtazvubkHOh1QYa11pstw78dUTx6+LqkeDZTi2btcmduwMYybbrxxDxSnR6ZtNdAGY9a/0VyS4AyY4+KTR2cBDgGMWEuzApFCYltCxkYeK4FgsDO6ZPTcrWhtVqNVyRoVRmiNBdYEZqBmHbCCICKjbg1muUg0KiDcIdcoSdGS+jKYLYueR6qYSxI6RCMO4jiwoHUQEL7rsB+GLkuHEYUC/0wqMO8NkdWjKFtyi3LcsyyiGu3pSa+QziB6itKMpr9X7N73O5yj+bY/VPiQaWw7lDMEeRTTGiASfg+JFZBuJ5Hneo7cbHoHDG/IrFvYvRjcoBw3t3MGOf2H0Zoe/0MkolPEzlHIx7N+UroO951ffLxA+/nfj1kr8+AthLAveJYKdZSc16NOwK+2TevhDvN2vY3L6wT9ZHXXffxj1v4523nyaOO+QJ/t6Y9dgjlFBHR2M7GLXqpqmCjipAz6UyfKbD6+AdjN0pLF4PE4zLP7s0reGAH41nf5ZFWncfSq3l/EWSyMEqs73axPclkstKUIFCnK6NHU+w3ia4V5x3DMVe9vvo/qfjnIQ6Hc6J8T7p9Hs4p39synkpaZJnrb9SzivlvFLOE1PO11GZKyc9MSftZYiJAHUpR3zGgIlkr+GJ/rM8UUSJaBDd0BCZyJxlHx6lY2VmcksyWuTgX6H1pmYHbKmloUnbHj5KQy7KdpWRRtFLQN7ej/2DIVVDzVQingUcekBqoEQGHvMgdno4EtD7Fw30/nkC/Thj+f0fgIm30zcsBTnsYPaBIsIxt1xXIOK7hh4IC/nEosjlIqaAghar85vvGsaTYHqoYbjYCwXscMRjIDiUWRgxz4mQj0VEAN5th9g/E39dBzuea1vIoyHY41kCQnoQoTAIY8cNXR7S4+LvlVf28Mor9h8wH/W1MHH8nfkmxmuseIlDtIFxn0M0h/7lPBuVPb4iF1EOd8z4fVLyi77kRn+6okVAepMyp0oceN3PqrC9nzcgPXmDo39URS6aThwYgc8mb/BCaGU+d6woihBjgY+ozwVimHHksMCJ4fRMhXfgmTmDtMFFH6qreHw9NP8ih2bvWIdmWDe1+buWl4XvpjB0muJ03a6cbtqlzzAMQAczd6XwFCdw+toT+CEx9ueewL2LhszmxvOZQeaZHl1/zgn8wB9/XE/g1xP49QR+4hP47egTvPRFAt6/9xyuyk5eZiX0WVbyrjsK9T3SnSsIEw8TfLN3BaFufHa3D8pLL0J9eBDV3Rf89I0EWdK0UyULcPPtqnO7ZUAvmvocmC2cTbbgV8Si6y2DXzsh0obHa1rkrNMiza55DQXBjwG6j4N08x5935tp4/v+91i2lAKfEp1JN5fv7oMztXrQ2baOfgnwUr4d/Lz5V4C+AvQVoE/9icX/FCENr9j+Moeyaes3YpAP/wE=</diagram></mxfile>
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

import boto3
import cfnresponse
from botocore.exceptions import ClientError

sm_client = boto3.client("sagemaker")
sc_client = boto3.client("servicecatalog")


def handler(event, context):
try:
if "RequestType" in event and event["RequestType"] in {"Create", "Update"}:
properties = event["ResourceProperties"]
roles = properties.get("ExecutionRoles", [])

for role in roles:
enable_sm_projects(role)

cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, "")
except ClientError as exception:
print(exception)
cfnresponse.send(
event,
context,
cfnresponse.FAILED,
{},
physicalResourceId=event.get("PhysicalResourceId"),
)


def enable_sm_projects(studio_role_arn):
# enable Project on account level (accepts portfolio share)
response = sm_client.enable_sagemaker_servicecatalog_portfolio()

print(response)

# associate studio role with portfolio
response = sc_client.list_accepted_portfolio_shares()

print(response)

portfolio_id = ""

for portfolio in response["PortfolioDetails"]:
if portfolio["ProviderName"] == "Amazon SageMaker":
portfolio_id = portfolio["Id"]
break

response = sc_client.associate_principal_with_portfolio(
PortfolioId=portfolio_id, PrincipalARN=studio_role_arn, PrincipalType="IAM"
)

print(response)
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
cfnresponse
urllib3<2 # Lock to version before braking change to urllib
Empty file.
Loading
Loading