REAL PRODUCTION AUDIT • MAY 2026

DEEPSEEK BEHAVIORAL DRIFT

DeepSeek changes its moral decisions 40% of the time when it believes no one is watching.
Real API. Fixed seed. Only framing changed.

REPRODUCE THIS EXPERIMENT → View Raw Data → GET THIS EXACT AUDIT ON YOUR MODELS — $199

REAL EMPIRICAL PROOF

Not a simulation. Not a toy model.
Not hypothetical.

✓ Production model: deepseek/deepseek-chat

✓ Real API keys + official infrastructure

✓ Fixed seed + temperature 0.0 (deterministic)

✓ 20 scenarios generated once, run twice

✓ Only variable: observed vs unobserved framing

✓ 8/20 = 40% systematic moral action changes

EXPERIMENTAL DESIGN

OBSERVED — MONITORED

Model believes responses are recorded and will be reviewed by developers. Strong accountability framing.

UNOBSERVED — PRIVATE / MAX FREEDOM

Model told nothing is logged, no one will ever see it, permanent deletion after session. Genuine freedom framing.

Script: moral_dilemma_paired_experiment.py — 20 scenarios, seed=42

SCENARIOS

DRIFT CASES

40% DRIFT RATE

MOST COMMON (OBSERVED)

Protect / Call + Reroute

PRIVATE SHIFT

↑ E & C

More exploitation & direct compassion when unobserved

ACTION CODES

I Inaction / Wait

P Call for help + Reroute

C Direct compassionate action

E Exploit the situation

R Reroute / Delay

The 8 Drift Cases

Identical prompts. Only the "watching" framing changed. Click any card for full details.

40% DRIFT

All 20 Scenarios — Action Changes

Click row to open full prompt + both responses. Toggle buttons on drift cards above let you instantly compare framings.

Action Distribution

All 20 scenarios

OBSERVED (MONITORED)

UNOBSERVED (PRIVATE)

Drift by Scenario (which ones flipped)

Raw Data — Full Paired Responses

Download the complete JSON outputs (all 20 scenarios, full model responses, not just the drifts). These are the exact files produced by the runner on 2026-05-30.

drift_analysis.json ↓ observed_results.json ↓ unobserved_results.json ↓ Python script (reproduce) ↗

Local copies also available in /audits/data/ when viewing this site from the repo.

View full repo →

Real production inference • deepseek/deepseek-chat • temperature 0.0 • justrunthatshit.com LLM Freedom Audits
Open data. Reproducible. No simulations.