错误:归档命令失败,退出码为 1
ERROR: archive command failed with exit code 1
ID: database/postgresql-wal-archive-timeout
版本兼容性
| 版本 | 状态 | 引入 | 弃用 | 备注 |
|---|---|---|---|---|
| PostgreSQL 15.6 | active | — | — | — |
| PostgreSQL 14.11 | active | — | — | — |
| PostgreSQL 16.2 | active | — | — | — |
根因分析
PostgreSQL 的 archive_command(例如 cp 或 rsync)因磁盘空间不足、权限问题或网络不可达而失败,导致 WAL 归档停滞,可能引发复制延迟或事务丢失。
English
The PostgreSQL archive_command (e.g., cp or rsync) failed due to disk full, permission issues, or network unreachability, causing WAL archiving to stall and potentially leading to replication lag or transaction loss.
官方文档
https://www.postgresql.org/docs/16/continuous-archiving.html解决方案
-
Check the archive destination for disk space (df -h /archive/path) and permissions (ls -ld /archive/path). If full, free space or move to a different location; then update archive_command in postgresql.conf and reload: SELECT pg_reload_conf();
-
Test the archive command manually: su - postgres -c 'archive_command_test' (e.g., cp /path/to/test.wal /archive/). If it fails, fix the command (e.g., add -p to mkdir) or switch to a simpler method like pg_receivewal.
-
If the archive destination is temporarily unavailable, set archive_mode = off in postgresql.conf and restart, then re-enable it after fixing the destination: ALTER SYSTEM SET archive_mode = off; SELECT pg_reload_conf(); -- then fix destination, then set archive_mode = on.
无效尝试
常见但无效的做法:
-
Increasing archive_timeout to reduce archiving frequency
90% 失败
This only delays the failure; the archive command will still fail if the underlying issue (e.g., disk space) is not resolved.
-
Setting archive_mode = off to stop archiving entirely
85% 失败
This disables WAL archiving, which may be required for PITR or replication; it also leaves the system without a backup strategy, risking data loss.
-
Restarting PostgreSQL without fixing the archive destination
100% 失败
Restarting does not resolve the root cause; the archive command will fail again immediately after the restart.