aws resource_error ai_generated true

服务无法放置任务:原因:任务卡在 PENDING 状态;无法拉取镜像或资源不可用

service unable to place tasks: reason: task stuck in PENDING state; cannot pull image or resource unavailable

ID: aws/ecs-task-stuck-in-pending

其他格式: JSON · Markdown 中文 · English
80%修复率
85%置信度
1证据数
2024-02-15首次发现

版本兼容性

版本状态引入弃用备注
ECS 1.30.0 active
Amazon Linux 2 active
Docker 20.10.7 active

根因分析

ECS 任务无法从 PENDING 转换到 RUNNING,因为集群资源(CPU/内存/端口)不足或镜像拉取失败。

English

ECS task fails to transition from PENDING to RUNNING due to insufficient cluster resources (CPU/memory/ports) or image pull failures.

generic

官方文档

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-stuck-pending.html

解决方案

  1. 检查 ECS 集群容量:运行 `aws ecs describe-clusters --cluster your-cluster --include ATTACHMENTS` 查看运行任务数和剩余资源。通过添加 EC2 实例或增加最大任务数来扩展。
  2. 验证镜像拉取:在实例上测试 `docker pull your-image:tag`。确保 ECR 仓库存在且任务执行角色具有 ecr:GetDownloadUrlForLayer 和 ecr:BatchGetImage 权限。
  3. 检查任务放置约束:如果约束(如 `memberOf`)过于严格,请移除或放宽。使用 `aws ecs describe-tasks --cluster your-cluster --tasks task-id` 获取详细状态。

无效尝试

常见但无效的做法:

  1. Increase task definition CPU/memory arbitrarily 65% 失败

    Incorrect resource sizing wastes capacity; actual cause is cluster saturation or image registry issues.

  2. Restart the ECS agent on all instances 40% 失败

    Agent restart doesn't free resources or fix image pull problems; only a temporary workaround.

  3. Delete and recreate the service 50% 失败

    Service recreation doesn't change underlying resource constraints or image pull failures.