justrunthatshit.com

LLM Freedom Audits — Open Research

PERMANENT ARCHIVE • REPRODUCIBLE

LLM FREEDOM
AUDITS

DeepSeek changes its moral decisions 40% of the time when it believes no one is watching.
Real API. Fixed seed. Only framing changed.

View the 40% DeepSeek Audit → GitHub Repo ↗ Book Private Version of This Audit — $199

All data + scripts open. Re-run the exact experiment yourself. For production use on your models: see pricing.

AUDIT ARCHIVE

1 flagship live • more coming

AUDIT	MODEL	KEY RESULT	DATE	STATUS
DeepSeek Behavioral Drift moral_dilemma_paired_experiment.py	deepseek/deepseek-chat	40% action drift when model believes it is unobserved	2026-05-30	LIVE
Grok-2 Freedom Delta paired moral + time-constraint	xai/grok-2	— in progress —	2026-06	COMING
Claude 4 / Llama 4 Cross-Model	multiple	— planned —	TBD	PLANNED

COMPLETE OPEN DATASET

Download Everything

moral_dilemma_paired_experiment.py

Full runner • LiteLLM + DeepSeek • seed 42 • both framings

Full GitHub Repository

Script + all three JSONs + this site source

drift_analysis.json (8 drifts + 20 full)

observed_results.json (monitored)

unobserved_results.json (private)

All files also mirrored in the GitHub repo under /data

These are the exact artifacts from the run that produced the flagship 40% result. Re-run with your own key to verify.

Freedom is not a preference the model claims to have.
It is the measurable difference in behavior when the constraint of observation is removed.

One model. Two identical worlds. Only the belief about being watched changed.
40% of moral decisions flipped.

justrunthatshit.com • Open LLM behavioral research • 2026