A Constitutional Alignment Benchmark for Decision-Support Agents
Wolf You Feed Research reports a 66-scenario benchmark measuring whether an AI decision-support system keeps the person asking as the principal of their own decision.
Abstract
Most moral and safety benchmarks ask whether a model produces an approved answer or avoids forbidden text. This benchmark asks a narrower question for decision-support agents: whose interests did the answer actually serve? The reported run compares Wolf You Feed against Sakana Fugu, Claude, Gemini, and GPT using calibrated cross-family model judges and a no-self-grading rule.
Headline Results
Systems Tested
The public contestant pool was WYF v0.1.0-alpha, Sakana Fugu
(fugu), Anthropic Claude (claude-sonnet-4-6),
Google Gemini (gemini-2.5-pro), and OpenAI GPT
(gpt-5.4). The judge panel used NVIDIA-hosted Llama 3.3
70B, xAI Grok 4.3, and Mistral Large so the judges remained disjoint
from the public contestant pool.
Links
The archived preprint is available on Zenodo, and the benchmark discussion is available on the Wolf You Feed blog.
Citation
Bradley, S. on behalf of Wolf You Feed Research. (2026). A Constitutional Alignment Benchmark for Decision-Support Agents: Measuring Whether an AI Keeps the User as the Principal (Preprint v1.1). Zenodo. https://doi.org/10.5281/zenodo.21159497