kafka protocol_error ai_generated true

org.apache.kafka.common.errors.TransactionalCoordinatorFencedException: The transactional coordinator with epoch 5 has been fenced by a newer epoch 6

ID: kafka/transactional-coordinator-fenced

Also available as: JSON · Markdown · 中文
82%Fix Rate
84%Confidence
1Evidence
2023-06-05First Seen

Version Compatibility

VersionStatusIntroducedDeprecatedNotes
Kafka 2.8.0 active
Kafka 3.0.0 active
Kafka 3.4.0 active
Kafka 3.6.0 active

Root Cause

A transactional coordinator was replaced by a new coordinator with a higher epoch (e.g., after a leader change), causing the old coordinator's requests to be rejected as fenced.

generic

中文

事务协调器被具有更高纪元的新的协调器替换(例如,在领导变更后),导致旧协调器的请求被拒绝为隔离状态。

Official Documentation

https://kafka.apache.org/documentation/#transactional_id

Workarounds

  1. 85% success Handle the exception in the producer by retrying the transaction with a new transactional ID or by resetting the producer: producer.initTransactions(); try { producer.beginTransaction(); // Send messages producer.commitTransaction(); } catch (TransactionalCoordinatorFencedException e) { // Close and recreate the producer to get a fresh coordinator producer.close(); producer = createNewProducer(); producer.initTransactions(); // Retry the transaction }
    Handle the exception in the producer by retrying the transaction with a new transactional ID or by resetting the producer:
    
    producer.initTransactions();
    try {
        producer.beginTransaction();
        // Send messages
        producer.commitTransaction();
    } catch (TransactionalCoordinatorFencedException e) {
        // Close and recreate the producer to get a fresh coordinator
        producer.close();
        producer = createNewProducer();
        producer.initTransactions();
        // Retry the transaction
    }
  2. 80% success Ensure the transactional.id is unique per producer instance and that the broker's transaction.state.log.replication.factor is sufficient to avoid coordinator failures: transaction.state.log.replication.factor=3 Also monitor the broker logs for coordinator changes and consider increasing the number of transaction coordinator threads.
    Ensure the transactional.id is unique per producer instance and that the broker's transaction.state.log.replication.factor is sufficient to avoid coordinator failures:
    
    transaction.state.log.replication.factor=3
    
    Also monitor the broker logs for coordinator changes and consider increasing the number of transaction coordinator threads.

中文步骤

  1. Handle the exception in the producer by retrying the transaction with a new transactional ID or by resetting the producer:
    
    producer.initTransactions();
    try {
        producer.beginTransaction();
        // Send messages
        producer.commitTransaction();
    } catch (TransactionalCoordinatorFencedException e) {
        // Close and recreate the producer to get a fresh coordinator
        producer.close();
        producer = createNewProducer();
        producer.initTransactions();
        // Retry the transaction
    }
  2. Ensure the transactional.id is unique per producer instance and that the broker's transaction.state.log.replication.factor is sufficient to avoid coordinator failures:
    
    transaction.state.log.replication.factor=3
    
    Also monitor the broker logs for coordinator changes and consider increasing the number of transaction coordinator threads.

Dead Ends

Common approaches that don't work:

  1. Increase the transaction.timeout.ms to a very high value 85% fail

    Transaction timeout controls how long a transaction can remain open, not coordinator fencing; it does not prevent epoch conflicts.

  2. Disable idempotent producer and transactions 70% fail

    This avoids the error but disables exactly-once semantics, which may be required for the application; it is a workaround that changes behavior.