# AWS Lambda SQS trigger: partial batch failure not handled, all messages become visible again after processing failure

- **ID:** `cloud/aws-lambda-sqs-batch-partial-failure`
- **Domain:** cloud
- **Category:** config_error
- **Error Code:** `Lambda.SQS.PartialBatchFailure`
- **Verification:** ai_generated
- **Fix Rate:** 90%

## Root Cause

When using SQS as a Lambda trigger with batch processing, if the Lambda function fails to process some messages but doesn't use 'ReportBatchItemFailures' to report specific failures, the entire batch is retried, causing duplicate processing of successful messages.

## Version Compatibility

| Version | Status | Introduced | Deprecated |
|---------|--------|------------|------------|
| AWS Lambda: runtime >= Node.js 18.x | active | — | — |
| AWS SDK: >= 3.300.0 | active | — | — |
| SQS: standard queue | active | — | — |

## Workarounds

1. **Implement 'ReportBatchItemFailures' in the Lambda function response. Example in Node.js: return { batchItemFailures: [ { itemIdentifier: failedMessage.messageId } ] }. Configure the event source mapping with 'FunctionResponseTypes: ["ReportBatchItemFailures"]'.** (95% success)
   ```
   Implement 'ReportBatchItemFailures' in the Lambda function response. Example in Node.js: return { batchItemFailures: [ { itemIdentifier: failedMessage.messageId } ] }. Configure the event source mapping with 'FunctionResponseTypes: ["ReportBatchItemFailures"]'.
   ```
2. **Use a dead-letter queue (DLQ) on the SQS source to capture failed messages after max retries, and process them separately.** (85% success)
   ```
   Use a dead-letter queue (DLQ) on the SQS source to capture failed messages after max retries, and process them separately.
   ```

## Dead Ends

- **** — Reduces throughput significantly; doesn't solve the root cause of failure reporting, and if any message fails, the single message is still retried indefinitely. (70% fail)
- **** — Silently swallows errors, leading to data loss and no visibility into processing failures. (90% fail)
- **** — Doesn't address the partial failure reporting; successful messages may still be reprocessed after the timeout expires. (50% fail)
