database system_error ai_generated true

错误:归档命令失败,退出码为 1

ERROR: archive command failed with exit code 1

ID: database/postgresql-wal-archive-timeout

其他格式: JSON · Markdown 中文 · English
85%修复率
88%置信度
1证据数
2024-02-10首次发现

版本兼容性

版本状态引入弃用备注
PostgreSQL 15.6 active
PostgreSQL 14.11 active
PostgreSQL 16.2 active

根因分析

PostgreSQL 的 archive_command(例如 cp 或 rsync)因磁盘空间不足、权限问题或网络不可达而失败,导致 WAL 归档停滞,可能引发复制延迟或事务丢失。

English

The PostgreSQL archive_command (e.g., cp or rsync) failed due to disk full, permission issues, or network unreachability, causing WAL archiving to stall and potentially leading to replication lag or transaction loss.

generic

官方文档

https://www.postgresql.org/docs/16/continuous-archiving.html

解决方案

  1. Check the archive destination for disk space (df -h /archive/path) and permissions (ls -ld /archive/path). If full, free space or move to a different location; then update archive_command in postgresql.conf and reload: SELECT pg_reload_conf();
  2. Test the archive command manually: su - postgres -c 'archive_command_test' (e.g., cp /path/to/test.wal /archive/). If it fails, fix the command (e.g., add -p to mkdir) or switch to a simpler method like pg_receivewal.
  3. If the archive destination is temporarily unavailable, set archive_mode = off in postgresql.conf and restart, then re-enable it after fixing the destination: ALTER SYSTEM SET archive_mode = off; SELECT pg_reload_conf(); -- then fix destination, then set archive_mode = on.

无效尝试

常见但无效的做法:

  1. Increasing archive_timeout to reduce archiving frequency 90% 失败

    This only delays the failure; the archive command will still fail if the underlying issue (e.g., disk space) is not resolved.

  2. Setting archive_mode = off to stop archiving entirely 85% 失败

    This disables WAL archiving, which may be required for PITR or replication; it also leaves the system without a backup strategy, risking data loss.

  3. Restarting PostgreSQL without fixing the archive destination 100% 失败

    Restarting does not resolve the root cause; the archive command will fail again immediately after the restart.