Skip to content

Commit

Permalink
Merge pull request aws-samples#8 from ViktorMalesevic/feature/improve…
Browse files Browse the repository at this point in the history
…-readmes

improved readmes and added dev guide
  • Loading branch information
ViktorMalesevic authored Jun 21, 2023
2 parents 35b9478 + b83a892 commit f4a9e5f
Show file tree
Hide file tree
Showing 7 changed files with 346 additions and 50 deletions.
36 changes: 29 additions & 7 deletions mlops-multi-account-cdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,34 @@ As enterprise businesses embrace Machine Learning (ML) across their organisation

In this repository, we have created a baseline infrastructure for a secure MLOps environment based on CDK. Our solution consists of two parts:

- [mlops-infra](mlops-infra/): The necessary secure infrastructure for the multiple accounts of MLOps including VPCs and entpoints, SSM, IAM user roles, etc.
- [mlops-infra](mlops-infra/): The necessary secure infrastructure for the multiple accounts of MLOps including VPCs and VPC endpoints, SSM, IAM user roles, etc.

- [mlops-sm-project-template](mlops-sm-project-template/): A custom Amazon SageMaker Project template that enable the multi account model promotion.
- [mlops-sm-project-template](mlops-sm-project-template/): A Service Catalog portfolio that contains custom Amazon SageMaker Project templates that enable multi account model promotion.

If you have any comments or questions, please contact:
## How to use:

Sokratis Kartakis <[email protected]>
First deploy [mlops-infra](mlops-infra/):

Fatema Alkhanaizi <[email protected]>
[mlops-infra](mlops-infra/) will deploy a Secure data science exploration environment for your data scientists to explore and train their models inside a SageMaker studio environment.
It also prepares your dev/preprod/prod accounts with the networking setup to: either run SageMaker studio in a VPC, or be able to create SageMaker Endpoints and other infrastructure inside VPCs.
Please note that the networking created by [mlops_infra](mlops-infra/mlops_infra) is a kick start example and that the repository is also designed to be able to import existing VPCs created by your organization instead of creating its own VPCs.
The repository will also create example SageMaker users (Lead Data Scientist and Data Scientist) and associated roles and policies.

Georgios Schinas <[email protected]>
Once you have deployed [mlops-infra](mlops-infra/), deploy [mlops-sm-project-template](mlops-sm-project-template/):

[mlops-sm-project-template](mlops-sm-project-template/) will create a Service Catalog portfolio that contains SageMaker project templates as Service Catalog products.
To do so, the [service_catalog](mlops-sm-project-template/mlops_sm_project_template/service_catalog.py) stack iterates over the [templates](mlops-sm-project-template/mlops_sm_project_template/templates/) folder which contains your different organization SageMaker project templates in the form of CDK stacks.
The general idea of what those templates create is explained in [mlops-sm-project-template README](mlops-sm-project-template/README) and in this [SageMaker Projects general architecture diagram](mlops-sm-project-template/diagrams/mlops-sm-project-general-architecture.jpg)
These example SageMaker project templates can be customized for the need of your organization.

**Note:** Both of those folders are cdk applications which also come with their respective CICD pipelines hosted in a central governance account, to deploy and maintain the infrastructure they define to target accounts. This is why you will see that both also contain a `pipeline_stack` and a `codecommit_stack`.
However if you are not interested in the concept of a centralized governance account and CICD mechanism, or if you already have an internal mechanism in place for those ([AWS Control Tower](https://docs.aws.amazon.com/controltower/index.html), [ADF](https://github.com/awslabs/aws-deployment-framework), etc...), you can simply use the `CoreStage` of each of those CDK applications. See the READMEs of each subfolder for more details.

## Contacts

If you have any comments or questions, please contact:

Maintaining Team:
The maintaining Team:

Viktor Malesevic <[email protected]>

Expand All @@ -26,3 +40,11 @@ Fotinos Kyriakides <[email protected]>
Gabija Pasiunaite <[email protected]>

Selena Tabbara <[email protected]>

Sokratis Kartakis <[email protected]>

Georgios Schinas <[email protected]>

# Special thanks

Fatema Alkhanaizi, who is no longer at AWS but has been the major initial contributor of the project.
40 changes: 40 additions & 0 deletions mlops-multi-account-cdk/mlops-infra/DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Developer Guide
While the solution presented in [README](README.md) can be used as is, this repository is built with the intention to be customized for the need of your organization.

[mlops_infra](mlops_infra/) will:
- Create or import VPCs via [networking_stack](mlops_infra/networking_stack.py). If created, the stack will create the required VPC endpoint for SageMaker studio and for deploying SageMaker endpoints and pipelines. If imported, ensure that the VPC you import contains at least the VPC endpoints listed in [README](README.md)
- Create networking SSM parameters via [networking_stack](mlops_infra/networking_stack.py) that will be used in the respective account either to deploy SageMaker studio or to deploy SageMaker endpoints.
- Create a SageMaker studio domain via [sagemaker_studio_stack](mlops_infra/sagemaker_studio_stack.py) alongside SageMaker studio users and the required roles. The list of roles and policies is defined in [sm_roles](mlops_infra/constructs/sm_roles.py)

This means for example that:
If you want to modify the policies associated with your SageMaker studio users (what a user can do from SageMaker studio in the account), you should modify [sm_roles](mlops_infra/constructs/sm_roles.py).

If you would like to give data scientists access to EMR for exploration from SageMaker Studio, you would modify the policy in the following way: (repeat the operation lead data scientists a few lines below)

```
# role for Data Scientist persona
self.data_scientist_role = iam.Role(
self,
"data-scientist-role",
assumed_by=iam.CompositePrincipal(
iam.ServicePrincipal("lambda.amazonaws.com"),
iam.ServicePrincipal("sagemaker.amazonaws.com"),
),
managed_policies=[
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonSSMReadOnlyAccess"),
iam.ManagedPolicy.from_aws_managed_policy_name("AWSLambda_ReadOnlyAccess"),
iam.ManagedPolicy.from_aws_managed_policy_name("AWSCodeCommitReadOnly"),
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEC2ContainerRegistryReadOnly"),
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonSageMakerFullAccess"),
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEMRFullAccessPolicy_v2"), <--- Added EMR Managed Policy
],
)
```

Similarly:
If you want to enable SageMaker Studio inside the VPC to communicate with Glue, in [networking_stack](mlops_infra/networking_stack.py) you would add:

```
# GLUE VPC Endpoint
self.primary_vpc.add_interface_endpoint("GLUEEndpoint", service=ec2.InterfaceVpcEndpointAwsService.GLUE)
```
Loading

0 comments on commit f4a9e5f

Please sign in to comment.