Lambda.SQS.PartialBatchFailure cloud config_error ai_generated true

AWS Lambda SQS trigger: partial batch failure not handled, all messages become visible again after processing failure

ID: cloud/aws-lambda-sqs-batch-partial-failure

Also available as: JSON · Markdown · 中文
90%Fix Rate
88%Confidence
1Evidence
2023-08-10First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
AWS Lambda: runtime >= Node.js 18.x active
AWS SDK: >= 3.300.0 active
SQS: standard queue active

Root Cause

When using SQS as a Lambda trigger with batch processing, if the Lambda function fails to process some messages but doesn't use 'ReportBatchItemFailures' to report specific failures, the entire batch is retried, causing duplicate processing of successful messages.

generic

中文

当使用 SQS 作为 Lambda 触发器进行批处理时,如果 Lambda 函数未能处理某些消息但未使用 'ReportBatchItemFailures' 报告特定失败,则整个批次将被重试,导致成功处理的消息被重复处理。

Official Documentation

https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailures

Workarounds

  1. 95% success Implement 'ReportBatchItemFailures' in the Lambda function response. Example in Node.js: return { batchItemFailures: [ { itemIdentifier: failedMessage.messageId } ] }. Configure the event source mapping with 'FunctionResponseTypes: ["ReportBatchItemFailures"]'.
    Implement 'ReportBatchItemFailures' in the Lambda function response. Example in Node.js: return { batchItemFailures: [ { itemIdentifier: failedMessage.messageId } ] }. Configure the event source mapping with 'FunctionResponseTypes: ["ReportBatchItemFailures"]'.
  2. 85% success Use a dead-letter queue (DLQ) on the SQS source to capture failed messages after max retries, and process them separately.
    Use a dead-letter queue (DLQ) on the SQS source to capture failed messages after max retries, and process them separately.

中文步骤

  1. Implement 'ReportBatchItemFailures' in the Lambda function response. Example in Node.js: return { batchItemFailures: [ { itemIdentifier: failedMessage.messageId } ] }. Configure the event source mapping with 'FunctionResponseTypes: ["ReportBatchItemFailures"]'.
  2. Use a dead-letter queue (DLQ) on the SQS source to capture failed messages after max retries, and process them separately.

Dead Ends

Common approaches that don't work:

  1. 70% fail

    Reduces throughput significantly; doesn't solve the root cause of failure reporting, and if any message fails, the single message is still retried indefinitely.

  2. 90% fail

    Silently swallows errors, leading to data loss and no visibility into processing failures.

  3. 50% fail

    Doesn't address the partial failure reporting; successful messages may still be reprocessed after the timeout expires.