aws
resource_error
ai_generated
true
停止原因:任务因内存不足 (OOM) 被终止
Stopped reason: Task stopped due to OOM (Out of Memory) kill
ID: aws/ecs-task-stopped-oom-kill
78%修复率
85%置信度
1证据数
2023-06-15首次发现
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| AWS CLI 2.x | active | — | — | — |
| ECS API 2014-11-13 | active | — | — | — |
| Amazon ECS Agent 1.66.0 | active | — | — | — |
根因分析
ECS 任务的内存分配被超出,导致 Linux 内核 OOM killer 终止了容器。
English
The ECS task's memory allocation was exceeded, causing the Linux kernel OOM killer to terminate the container.
官方文档
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-stopped-reason.html解决方案
-
在 ECS 任务定义中增加任务内存硬限制,例如将 'memory' 设置为 2048 MiB,并确保 'memoryReservation' 至少为 512 MiB,然后重新部署服务。
-
通过 CloudWatch Container Insights 添加内存使用监控:在 ECS 集群设置中启用,然后设置 CloudWatch 告警监控 'MemoryUtilized' 指标。如果使用量飙升,考虑优化应用程序代码以减少内存占用。
无效尝试
常见但无效的做法:
-
65% 失败
Increasing memory hard limit without adjusting soft limit or task memory reservation can still cause OOM if the container exceeds the hard limit.
-
50% 失败
Rebuilding the container with a different base image without monitoring memory usage doesn't address the root cause of memory leak or high consumption.