Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS agent on windows does not work for more than 10 CPU despite setting 'ECS_ENABLE_TASK_CPU_MEM_LIMIT' to true #4197

Open
RomaricKanyamibwa opened this issue May 31, 2024 · 4 comments

Comments

@RomaricKanyamibwa
Copy link

RomaricKanyamibwa commented May 31, 2024

Description

I have been trying to launch Windows ECS tasks with 16 CPUs, but I have been getting only TaskFailledToStart: ATTRIBUTE. After some research, I found that since I am requesting more than 10 CPUs, the ecs.capability.increased-task-cpu-limit capability is set. To activate it, I need to set ECS_ENABLE_TASK_CPU_MEM_LIMIT to true, which I do, but it does not seem to be enough, and I still get the error.

Expected Behavior

I expected for my tasks to run like any other when I set the CPU at 10 or less

Observed Behavior

Instead I get TaskFailledToStart: ATTRIBUTE on my ECS task and this line on my ecs logs :

level=warn time=2024-05-31T10:15:35Z msg="Increased Task CPU Limit capability is disabled since the Task CPU + Mem Limit capability is disabled." module=agent_capability.go

Environment Details

  • OS : windows-server-2022
  • AMI : Windows_Server-2022-English-Full-ECS_Optimized-2024.04.09 (ami-06bdecd2e519349ce)
  • ECS configuration :
# configure ecs cluster

[Environment]::SetEnvironmentVariable("ECS_CLUSTER", "x86_64-windows-2022-cluster","Machine")
[Environment]::SetEnvironmentVariable("ECS_IMAGE_PULL_BEHAVIOR","once","Machine")
[Environment]::SetEnvironmentVariable("ECS_AWSVPC_BLOCK_IMDS","true ","Machine")
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE","true","Machine")
[Environment]::SetEnvironmentVariable("ECS_ENABLE_TASK_CPU_MEM_LIMIT","true","Machine")
# init ecs agent
Import-Module ECSTools
Initialize-ECSAgent -EnableTaskIAMRole -EnableTaskENI -LoggingDrivers "['json-file','awslogs']"

ecs task definition :

{
    "taskDefinitionArn": "arn:aws:ecs:eu-west-1:123456789:task-definition/e3-windows-core-2022-c16384-m16768-d100-rDefaultTaskRole-cgitlab-runner-cluster-x86_64-windows-2022-task-v4:1",
    "containerDefinitions": [
        {
            "name": "ci-coordinator",
            "image": "e3-windows-core-2022:latest",
            "repositoryCredentials": {
                "credentialsParameter": "arn:aws:secretsmanager:eu-west-1:123456789:secret:gitlab/read-registry-token-3d3tcn"
            },
            "cpu": 16384,
            "memory": 16768,
            "portMappings": [
                {
                    "containerPort": 22,
                    "hostPort": 22,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "environment": [],
            "mountPoints": [],
            "volumesFrom": [],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "gitlab-runner-cluster-x86_64-windows-2022",
                    "awslogs-create-group": "true",
                    "awslogs-region": "eu-west-1",
                    "awslogs-stream-prefix": "kanya/sandbox-kanya"
                }
            },
            "systemControls": []
        }
    ],
    "family": "e3-windows-core-2022-c16384-m16768-d100-rDefaultTaskRole-cgitlab-runner-cluster-x86_64-windows-2022-task-v4",
    "taskRoleArn": "arn:aws:iam::123456789:role/gitlab-runner-task-roles/DefaultTaskRole",
    "executionRoleArn": "arn:aws:iam::123456789:role/gitlab-runner/GitLabRunnerTasksExecRole",
    "networkMode": "awsvpc",
    "revision": 1,
    "volumes": [],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
        },
        {
            "name": "ecs.capability.execution-role-awslogs"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
        },
        {
            "name": "ecs.capability.private-registry-authentication.secretsmanager"
        },
        {
            "name": "ecs.capability.increased-task-cpu-limit"
        },
        {
            "name": "com.amazonaws.ecs.capability.task-iam-role"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "ecs.capability.task-eni"
        },
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2"
    ],
    "requiresCompatibilities": [],
    "cpu": "16384",
    "memory": "16768",
    "registeredAt": "2024-05-23T10:56:19.169Z",
    "registeredBy": "arn:aws:sts::123456789:assumed-role/GitlabRunnerInstanceRole/i-0372d6e1e2a19ad1e",
    "tags": []
}

Supporting Log Snippets

> cat C:\ProgramData\Amazon\ECS\log\ecs-agent.log
level=error time=2024-05-31T10:15:26Z msg="Driver log level mapping not found" module=log.go
level=error time=2024-05-31T10:15:26Z msg="Instance log level mapping not found" module=log.go
level=info time=2024-05-31T10:15:26Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider"
level=info time=2024-05-31T10:15:26Z msg="Starting Amazon ECS Agent" version="1.82.2" commit="ff5590b7"
level=info time=2024-05-31T10:15:26Z msg="Loading configuration"
level=warn time=2024-05-31T10:15:26Z msg="Invalid format for \"ECS_AVAILABLE_LOGGING_DRIVERS\" environment variable; expected a JSON array like [\"json-file\",\"syslog\"]. err invalid character '\\'' looking for beginning of value" module=parse.go
level=info time=2024-05-31T10:15:35Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider"
level=info time=2024-05-31T10:15:35Z msg="Unable to get Docker client for version 1.17: Error response from daemon: client version 1.17 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" module=sdkclientfactory.go
level=info time=2024-05-31T10:15:35Z msg="Unable to get Docker client for version 1.18: Error response from daemon: client version 1.18 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" module=sdkclientfactory.go
level=info time=2024-05-31T10:15:35Z msg="Unable to get Docker client for version 1.19: Error response from daemon: client version 1.19 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" module=sdkclientfactory.go
level=info time=2024-05-31T10:15:35Z msg="Unable to get Docker client for version 1.20: Error response from daemon: client version 1.20 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" module=sdkclientfactory.go
level=info time=2024-05-31T10:15:35Z msg="Unable to get Docker client for version 1.21: Error response from daemon: client version 1.21 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" module=sdkclientfactory.go
level=info time=2024-05-31T10:15:35Z msg="Unable to get Docker client for version 1.22: Error response from daemon: client version 1.22 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" module=sdkclientfactory.go
level=info time=2024-05-31T10:15:35Z msg="Unable to get Docker client for version 1.23: Error response from daemon: client version 1.23 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" module=sdkclientfactory.go
level=info time=2024-05-31T10:15:35Z msg="Setting minimum docker API version" previousMinAPIVersion=1.21 newMinAPIVersion=1.24
level=info time=2024-05-31T10:15:35Z msg="Registered transformation function with threshold 1.0.0."
level=info time=2024-05-31T10:15:35Z msg="Successfully got ECS instance credentials from provider: EC2RoleProvider"
level=info time=2024-05-31T10:15:35Z msg="Starting Windows service" module=agent_windows.go
level=info time=2024-05-31T10:15:35Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-pause:0.1.0"
level=info time=2024-05-31T10:15:35Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-pause:0.1.0"
level=info time=2024-05-31T10:15:35Z msg="Image excluded from cleanup" image="amazon/amazon-ecs-agent:latest"
level=info time=2024-05-31T10:15:35Z msg="Remaining memory" remainingMemory=64135
level=info time=2024-05-31T10:15:35Z msg="Event stream ContainerChange start listening..." module=eventstream.go
level=info time=2024-05-31T10:15:35Z msg="Initializing host resource manager, initialHostResource" initialHostResource=map[CPU:{
  IntegerValue: 32768,
  Name: "CPU",
  Type: "INTEGER"
} GPU:{
  Name: "GPU",
  StringSetValue: [],
  Type: "STRINGSET"
} MEMORY:{
  IntegerValue: 64135,
  Name: "MEMORY",
  Type: "INTEGER"
} PORTS_TCP:{
  Name: "PORTS_TCP",
  StringSetValue: [
    "2375",
    "2376",
    "51678",
    "51679",
    "3389",
    "135",
    "445",
    "5985",
    "5986",
    "53",
    "139",
    "80"
  ],
  Type: "STRINGSET"
} PORTS_UDP:{
  Name: "PORTS_UDP",
  StringSetValue: [],
  Type: "STRINGSET"
}]
level=info time=2024-05-31T10:15:35Z msg="Initializing host resource manager, consumed resource" consumedResource=map[CPU:{
  IntegerValue: 0,
  Name: "CPU",
  Type: "INTEGER"
} GPU:{
  Name: "GPU",
  StringSetValue: [],
  Type: "STRINGSET"
} MEMORY:{
  IntegerValue: 0,
  Name: "MEMORY",
  Type: "INTEGER"
} PORTS_TCP:{
  Name: "PORTS_TCP",
  StringSetValue: [
    "2375",
    "2376",
    "51678",
    "51679",
    "3389",
    "135",
    "445",
    "5985",
    "5986",
    "53",
    "139",
    "80"
  ],
  Type: "STRINGSET"
} PORTS_UDP:{
  Name: "PORTS_UDP",
  StringSetValue: [],
  Type: "STRINGSET"
}]
level=info time=2024-05-31T10:15:35Z msg="Loading state!" module=state_manager.go
level=info time=2024-05-31T10:15:35Z msg="windows eni watcher has been initialized" module=watcher_windows.go
level=warn time=2024-05-31T10:15:35Z msg="Increased Task CPU Limit capability is disabled since the Task CPU + Mem Limit capability is disabled." module=agent_capability.go
level=error time=2024-05-31T10:15:38Z msg="ServiceConnect Capability: Failed to load appnet Agent container. This container instance will not be able to support ServiceConnect tasks" error="appnetAgent container load: unsupported platform: windows/amd64"
level=warn time=2024-05-31T10:15:38Z msg="daemonDefinitions is empty/nil after import"
level=info time=2024-05-31T10:15:38Z msg="Registering Instance with ECS"
level=info time=2024-05-31T10:15:38Z msg="Remaining memory" remainingMemory=64135
level=info time=2024-05-31T10:15:38Z msg="Registered container instance with cluster!"
level=info time=2024-05-31T10:15:38Z msg="Instance registration completed successfully" instanceArn="***" cluster="x86_64-windows-2022-cluster"
level=info time=2024-05-31T10:15:38Z msg="Reconciling host resources"
level=info time=2024-05-31T10:15:38Z msg="Monitoring Task Queue started"
level=info time=2024-05-31T10:15:38Z msg="Event stream DeregisterContainerInstance start listening..." module=eventstream.go
level=info time=2024-05-31T10:15:38Z msg="Initializing stats engine"
level=info time=2024-05-31T10:15:38Z msg="Beginning Polling for updates"
level=info time=2024-05-31T10:15:38Z msg="Establishing a Websocket connection" url="****"
level=info time=2024-05-31T10:15:38Z msg="NO_PROXY is set: 169.254.169.254,169.254.170.2,//./pipe/docker_engine"
level=info time=2024-05-31T10:15:38Z msg="Establishing a Websocket connection" url="****"
level=info time=2024-05-31T10:15:38Z msg="Websocket connection established." URL="****" ConnectTime
="2024-05-31 10:15:38" ExpectedDisconnectTime="2024-05-31 10:45:38"
level=info time=2024-05-31T10:15:38Z msg="Connected to TCS endpoint"
level=info time=2024-05-31T10:15:38Z msg="Websocket connection established." URL="****" ConnectTime="2024-05-31 10:15:38" ExpectedDisconnectTime="2024-05-31 10:45:38"
level=info time=2024-05-31T10:15:38Z msg="Connected to ACS endpoint"
@SabrineMihni
Copy link

I have the similar issue

image

@Yiyuanzzz
Copy link
Contributor

Yiyuanzzz commented Jun 21, 2024

I have the similar issue

image

Hi @SabrineMihni , since you are experiencing a similar issue, could you provide more information about the environment details, thanks!

@Yiyuanzzz
Copy link
Contributor

Hi @RomaricKanyamibwa

// ensure TaskResourceLimit is disabled
	cfg.TaskCPUMemLimit.Value = ExplicitlyDisabled

after looking into the agent code, above indicating that the Task CPU and Memory Limit feature is explicitly disabled for windows platform, so you will not able to set ECS_ENABLE_TASK_CPU_MEM_LIMIT to true for windows

@RomaricKanyamibwa
Copy link
Author

Hi @RomaricKanyamibwa

// ensure TaskResourceLimit is disabled
	cfg.TaskCPUMemLimit.Value = ExplicitlyDisabled

after looking into the agent code, above indicating that the Task CPU and Memory Limit feature is explicitly disabled for windows platform, so you will not able to set ECS_ENABLE_TASK_CPU_MEM_LIMIT to true for windows

Hello @Yiyuanzzz ,

Thanks for taking the time to look into the code and answer. So is there any way to get more than 10 CPUs on windows or this explicit deactivation means no tasks can have more?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants