aws resource_error ai_generated true

service unable to place tasks: reason: task stuck in PENDING state; cannot pull image or resource unavailable

ID: aws/ecs-task-stuck-in-pending

Also available as: JSON · Markdown · 中文
80%Fix Rate
85%Confidence
1Evidence
2024-02-15First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
ECS 1.30.0 active
Amazon Linux 2 active
Docker 20.10.7 active

Root Cause

ECS task fails to transition from PENDING to RUNNING due to insufficient cluster resources (CPU/memory/ports) or image pull failures.

generic

中文

ECS 任务无法从 PENDING 转换到 RUNNING,因为集群资源(CPU/内存/端口)不足或镜像拉取失败。

Official Documentation

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-stuck-pending.html

Workarounds

  1. 75% success Check ECS cluster capacity: run `aws ecs describe-clusters --cluster your-cluster --include ATTACHMENTS` to see running tasks and remaining resources. Scale out by adding EC2 instances or increase max tasks.
    Check ECS cluster capacity: run `aws ecs describe-clusters --cluster your-cluster --include ATTACHMENTS` to see running tasks and remaining resources. Scale out by adding EC2 instances or increase max tasks.
  2. 85% success Verify image pull: test with `docker pull your-image:tag` on an instance. Ensure ECR repository exists and task execution role has ecr:GetDownloadUrlForLayer and ecr:BatchGetImage permissions.
    Verify image pull: test with `docker pull your-image:tag` on an instance. Ensure ECR repository exists and task execution role has ecr:GetDownloadUrlForLayer and ecr:BatchGetImage permissions.
  3. 70% success Review task placement constraints: remove or relax constraints like `memberOf` if they're too restrictive. Use `aws ecs describe-tasks --cluster your-cluster --tasks task-id` to get detailed status.
    Review task placement constraints: remove or relax constraints like `memberOf` if they're too restrictive. Use `aws ecs describe-tasks --cluster your-cluster --tasks task-id` to get detailed status.

中文步骤

  1. 检查 ECS 集群容量:运行 `aws ecs describe-clusters --cluster your-cluster --include ATTACHMENTS` 查看运行任务数和剩余资源。通过添加 EC2 实例或增加最大任务数来扩展。
  2. 验证镜像拉取:在实例上测试 `docker pull your-image:tag`。确保 ECR 仓库存在且任务执行角色具有 ecr:GetDownloadUrlForLayer 和 ecr:BatchGetImage 权限。
  3. 检查任务放置约束:如果约束(如 `memberOf`)过于严格,请移除或放宽。使用 `aws ecs describe-tasks --cluster your-cluster --tasks task-id` 获取详细状态。

Dead Ends

Common approaches that don't work:

  1. Increase task definition CPU/memory arbitrarily 65% fail

    Incorrect resource sizing wastes capacity; actual cause is cluster saturation or image registry issues.

  2. Restart the ECS agent on all instances 40% fail

    Agent restart doesn't free resources or fix image pull problems; only a temporary workaround.

  3. Delete and recreate the service 50% fail

    Service recreation doesn't change underlying resource constraints or image pull failures.