pytorch module_error ai_generated true

RuntimeError: FSDP checkpoint loading failed: Unexpected key(s) in state_dict: "module._fsdp_wrapped_module.flat_param"

ID: pytorch/fsdp-unexpected-key

Also available as: JSON · Markdown · 中文

82%Fix Rate

87%Confidence

1Evidence

2023-09-12First Seen

Version Compatibility

Version	Status	Introduced	Deprecated	Notes
torch>=2.0.0	active	—	—	—
FSDP>=1.12	active	—	—	—

Root Cause

The state_dict contains FSDP internal keys (e.g., flat_param) that are not expected when loading into a non-FSDP model, or the FSDP wrapping hierarchy mismatches between save and load.

generic

中文

状态字典包含 FSDP 内部键（例如 flat_param），加载到非 FSDP 模型时这些键是意外的，或者保存和加载时的 FSDP 包装层次不匹配。

Official Documentation

https://pytorch.org/docs/stable/fsdp.html

Workarounds

85% success Save the model state_dict with `state_dict` method instead of the FSDP wrapped module's state_dict, using `model.state_dict()` after unwrapping.
```
Save the model state_dict with `state_dict` method instead of the FSDP wrapped module's state_dict, using `model.state_dict()` after unwrapping.
```
80% success Use `torch.distributed.fsdp.FullyShardedDataParallel.summon_full_params(model)` to get a full state_dict without FSDP keys.
```
Use `torch.distributed.fsdp.FullyShardedDataParallel.summon_full_params(model)` to get a full state_dict without FSDP keys.
```

中文步骤

Save the model state_dict with `state_dict` method instead of the FSDP wrapped module's state_dict, using `model.state_dict()` after unwrapping.

Use `torch.distributed.fsdp.FullyShardedDataParallel.summon_full_params(model)` to get a full state_dict without FSDP keys.

Dead Ends

Common approaches that don't work:

80% fail
Simply ignoring the unexpected keys with strict=False may lead to incorrect model weights.
90% fail
Renaming state_dict keys manually often introduces errors and is not scalable.