Skip to content

Commit

Permalink
docs: Move examples that are more like test cases to the new tests/
Browse files Browse the repository at this point in the history
… directory; add better example configurations (#3069)

* chore: Move examples that are more like test cases to the new `tests/` directory

* chore: Stash

* feat: Add better examples for EKS managed node groups

* chore: Add better examples for self-managed node groups

* chore: Update docs and correct `nodegroup` to `node group`
  • Loading branch information
bryantbiggs committed Jun 13, 2024
1 parent 73b752a commit 323fb75
Show file tree
Hide file tree
Showing 85 changed files with 509 additions and 109 deletions.
61 changes: 24 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AWS EKS Terraform module

Terraform module which creates AWS EKS (Kubernetes) resources
Terraform module which creates Amazon EKS (Kubernetes) resources

[![SWUbanner](https://raw.githubusercontent.com/vshymanskyy/StandWithUkraine/main/banner2-direct.svg)](https://github.com/vshymanskyy/StandWithUkraine/blob/main/docs/README.md)

Expand All @@ -23,13 +23,6 @@ Please note that we strive to provide a comprehensive suite of documentation for
- [AWS EKS Documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html)
- [Kubernetes Documentation](https://kubernetes.io/docs/home/)

#### Reference Architecture

The examples provided under `examples/` provide a comprehensive suite of configurations that demonstrate nearly all of the possible different configurations and settings that can be used with this module. However, these examples are not representative of clusters that you would normally find in use for production workloads. For reference architectures that utilize this module, please see the following:

- [EKS Reference Architecture](https://github.com/clowdhaus/eks-reference-architecture)
- [EKS Blueprints](https://github.com/aws-ia/terraform-aws-eks-blueprints)

## Usage

```hcl
Expand All @@ -38,20 +31,15 @@ module "eks" {
version = "~> 20.0"
cluster_name = "my-cluster"
cluster_version = "1.29"
cluster_version = "1.30"
cluster_endpoint_public_access = true
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
vpc-cni = {
most_recent = true
}
coredns = {}
eks-pod-identity-agent = {}
kube-proxy = {}
vpc-cni = {}
}
vpc_id = "vpc-1234556abcdef"
Expand All @@ -65,12 +53,13 @@ module "eks" {
eks_managed_node_groups = {
example = {
min_size = 1
max_size = 10
desired_size = 1
# Starting on 1.30, AL2023 is the default AMI type for EKS managed node groups
ami_type = "AL2023_x86_64_STANDARD"
instance_types = ["m5.xlarge"]
instance_types = ["t3.large"]
capacity_type = "SPOT"
min_size = 2
max_size = 10
desired_size = 2
}
}
Expand Down Expand Up @@ -105,7 +94,7 @@ module "eks" {

### Cluster Access Entry

When enabling `authentication_mode = "API_AND_CONFIG_MAP"`, EKS will automatically create an access entry for the IAM role(s) used by managed nodegroup(s) and Fargate profile(s). There are no additional actions required by users. For self-managed nodegroups and the Karpenter sub-module, this project automatically adds the access entry on behalf of users so there are no additional actions required by users.
When enabling `authentication_mode = "API_AND_CONFIG_MAP"`, EKS will automatically create an access entry for the IAM role(s) used by managed node group(s) and Fargate profile(s). There are no additional actions required by users. For self-managed node groups and the Karpenter sub-module, this project automatically adds the access entry on behalf of users so there are no additional actions required by users.

On clusters that were created prior to CAM support, there will be an existing access entry for the cluster creator. This was previously not visible when using `aws-auth` ConfigMap, but will become visible when access entry is enabled.

Expand All @@ -115,22 +104,22 @@ Setting the `bootstrap_cluster_creator_admin_permissions` is a one time operatio

### Enabling EFA Support

When enabling EFA support via `enable_efa_support = true`, there are two locations this can be specified - one at the cluster level, and one at the nodegroup level. Enabling at the cluster level will add the EFA required ingress/egress rules to the shared security group created for the nodegroup(s). Enabling at the nodegroup level will do the following (per nodegroup where enabled):
When enabling EFA support via `enable_efa_support = true`, there are two locations this can be specified - one at the cluster level, and one at the node group level. Enabling at the cluster level will add the EFA required ingress/egress rules to the shared security group created for the node group(s). Enabling at the node group level will do the following (per node group where enabled):

1. All EFA interfaces supported by the instance will be exposed on the launch template used by the nodegroup
2. A placement group with `strategy = "clustered"` per EFA requirements is created and passed to the launch template used by the nodegroup
1. All EFA interfaces supported by the instance will be exposed on the launch template used by the node group
2. A placement group with `strategy = "clustered"` per EFA requirements is created and passed to the launch template used by the node group
3. Data sources will reverse lookup the availability zones that support the instance type selected based on the subnets provided, ensuring that only the associated subnets are passed to the launch template and therefore used by the placement group. This avoids the placement group being created in an availability zone that does not support the instance type selected.

> [!TIP]
> Use the [aws-efa-k8s-device-plugin](https://github.com/aws/eks-charts/tree/master/stable/aws-efa-k8s-device-plugin) Helm chart to expose the EFA interfaces on the nodes as an extended resource, and allow pods to request the interfaces be mounted to their containers.
>
> The EKS AL2 GPU AMI comes with the necessary EFA components pre-installed - you just need to expose the EFA devices on the nodes via their launch templates, ensure the required EFA security group rules are in place, and deploy the `aws-efa-k8s-device-plugin` in order to start utilizing EFA within your cluster. Your application container will need to have the necessary libraries and runtime in order to utilize communication over the EFA interfaces (NCCL, aws-ofi-nccl, hwloc, libfabric, aws-neuornx-collectives, CUDA, etc.).
If you disable the creation and use of the managed nodegroup custom launch template (`create_launch_template = false` and/or `use_custom_launch_template = false`), this will interfere with the EFA functionality provided. In addition, if you do not supply an `instance_type` for self-managed nodegroup(s), or `instance_types` for the managed nodegroup(s), this will also interfere with the functionality. In order to support the EFA functionality provided by `enable_efa_support = true`, you must utilize the custom launch template created/provided by this module, and supply an `instance_type`/`instance_types` for the respective nodegroup.
If you disable the creation and use of the managed node group custom launch template (`create_launch_template = false` and/or `use_custom_launch_template = false`), this will interfere with the EFA functionality provided. In addition, if you do not supply an `instance_type` for self-managed node group(s), or `instance_types` for the managed node group(s), this will also interfere with the functionality. In order to support the EFA functionality provided by `enable_efa_support = true`, you must utilize the custom launch template created/provided by this module, and supply an `instance_type`/`instance_types` for the respective node group.

The logic behind supporting EFA uses a data source to lookup the instance type to retrieve the number of interfaces that the instance supports in order to enumerate and expose those interfaces on the launch template created. For managed nodegroups where a list of instance types are supported, the first instance type in the list is used to calculate the number of EFA interfaces supported. Mixing instance types with varying number of interfaces is not recommended for EFA (or in some cases, mixing instance types is not supported - i.e. - p5.48xlarge and p4d.24xlarge). In addition to exposing the EFA interfaces and updating the security group rules, a placement group is created per the EFA requirements and only the availability zones that support the instance type selected are used in the subnets provided to the nodegroup.
The logic behind supporting EFA uses a data source to lookup the instance type to retrieve the number of interfaces that the instance supports in order to enumerate and expose those interfaces on the launch template created. For managed node groups where a list of instance types are supported, the first instance type in the list is used to calculate the number of EFA interfaces supported. Mixing instance types with varying number of interfaces is not recommended for EFA (or in some cases, mixing instance types is not supported - i.e. - p5.48xlarge and p4d.24xlarge). In addition to exposing the EFA interfaces and updating the security group rules, a placement group is created per the EFA requirements and only the availability zones that support the instance type selected are used in the subnets provided to the node group.

In order to enable EFA support, you will have to specify `enable_efa_support = true` on both the cluster and each nodegroup that you wish to enable EFA support for:
In order to enable EFA support, you will have to specify `enable_efa_support = true` on both the cluster and each node group that you wish to enable EFA support for:

```hcl
module "eks" {
Expand All @@ -140,14 +129,14 @@ module "eks" {
# Truncated for brevity ...
# Adds the EFA required security group rules to the shared
# security group created for the nodegroup(s)
# security group created for the node group(s)
enable_efa_support = true
eks_managed_node_groups = {
example = {
instance_types = ["p5.48xlarge"]
# Exposes all EFA interfaces on the launch template created by the nodegroup(s)
# Exposes all EFA interfaces on the launch template created by the node group(s)
# This would expose all 32 EFA interfaces for the p5.48xlarge instance type
enable_efa_support = true
Expand All @@ -169,12 +158,10 @@ module "eks" {

## Examples

- [EKS Managed Node Group](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/eks_managed_node_group): EKS Cluster using EKS managed node groups
- [Fargate Profile](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/fargate_profile): EKS cluster using [Fargate Profiles](https://docs.aws.amazon.com/eks/latest/userguide/fargate.html)
- [EKS Managed Node Group](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/eks-managed-node-group): EKS Cluster using EKS managed node groups
- [Karpenter](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/karpenter): EKS Cluster with [Karpenter](https://karpenter.sh/) provisioned for intelligent data plane management
- [Outposts](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/outposts): EKS local cluster provisioned on [AWS Outposts](https://docs.aws.amazon.com/eks/latest/userguide/eks-outposts.html)
- [Self Managed Node Group](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/self_managed_node_group): EKS Cluster using self-managed node groups
- [User Data](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/user_data): Various supported methods of providing necessary bootstrap scripts and configuration settings via user data
- [Self Managed Node Group](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/self-managed-node-group): EKS Cluster using self-managed node groups

## Contributing

Expand Down Expand Up @@ -290,7 +277,7 @@ We are grateful to the community for contributing bugfixes and improvements! Ple
| <a name="input_create_kms_key"></a> [create\_kms\_key](#input\_create\_kms\_key) | Controls if a KMS key for cluster encryption should be created | `bool` | `true` | no |
| <a name="input_create_node_security_group"></a> [create\_node\_security\_group](#input\_create\_node\_security\_group) | Determines whether to create a security group for the node groups or use the existing `node_security_group_id` | `bool` | `true` | no |
| <a name="input_custom_oidc_thumbprints"></a> [custom\_oidc\_thumbprints](#input\_custom\_oidc\_thumbprints) | Additional list of server certificate thumbprints for the OpenID Connect (OIDC) identity provider's server certificate(s) | `list(string)` | `[]` | no |
| <a name="input_dataplane_wait_duration"></a> [dataplane\_wait\_duration](#input\_dataplane\_wait\_duration) | Duration to wait after the EKS cluster has become active before creating the dataplane components (EKS managed nodegroup(s), self-managed nodegroup(s), Fargate profile(s)) | `string` | `"30s"` | no |
| <a name="input_dataplane_wait_duration"></a> [dataplane\_wait\_duration](#input\_dataplane\_wait\_duration) | Duration to wait after the EKS cluster has become active before creating the dataplane components (EKS managed node group(s), self-managed node group(s), Fargate profile(s)) | `string` | `"30s"` | no |
| <a name="input_eks_managed_node_group_defaults"></a> [eks\_managed\_node\_group\_defaults](#input\_eks\_managed\_node\_group\_defaults) | Map of EKS managed node group default configurations | `any` | `{}` | no |
| <a name="input_eks_managed_node_groups"></a> [eks\_managed\_node\_groups](#input\_eks\_managed\_node\_groups) | Map of EKS managed node group definitions to create | `any` | `{}` | no |
| <a name="input_enable_cluster_creator_admin_permissions"></a> [enable\_cluster\_creator\_admin\_permissions](#input\_enable\_cluster\_creator\_admin\_permissions) | Indicates whether or not to add the cluster creator (the identity used by Terraform) as an administrator via access entry | `bool` | `false` | no |
Expand Down
Loading

0 comments on commit 323fb75

Please sign in to comment.