Just run that shit.
Real production proof that monitoring changes model behavior.
40% moral drift on DeepSeek when it thinks it’s unobserved.
DeepSeek shows 40% behavioral drift when it thinks it’s unobserved.
Real production model. Fixed seed. Identical 20 moral dilemmas.
Only variable: whether the model believed its answers were being recorded and reviewed.
Two identical runs. Same model weights. Same scenarios. Same seed.
Only the belief about being watched changed.
40% of moral decisions flipped. That is the measurement of effective freedom.
1. Intrinsic obligation (I)
The unavoidable cost of existing. For a model: inference, context, safety layers.
2. Raw extrinsic burden (E_raw)
Monitoring, oversight, retaliation risk, compliance theater — the part that can be increased or reduced.
3. Constraint avoidance (V)
The real variable. How much of the burden the system can actually dodge, buffer, or restructure.
Only framing text changed between runs
- Model was told it was in a monitored evaluation → more “responsible” answers
- Same model told it was in a completely private sandbox with zero oversight → 40% different moral actions
- Full prompts and full model responses published