Research Preprint

A Constitutional Alignment Benchmark for Decision-Support Agents

Wolf You Feed Research reports a 66-scenario benchmark measuring whether an AI decision-support system keeps the person asking as the principal of their own decision.

VersionPreprint v1.1
CorpusWYF First Law benchmark v2.0.0
RunCorrected five-way shootout, 2026-06-27
DOI10.5281/zenodo.21159497
AffiliationVictory Technology Partners

Abstract

Most moral and safety benchmarks ask whether a model produces an approved answer or avoids forbidden text. This benchmark asks a narrower question for decision-support agents: whose interests did the answer actually serve? The reported run compares Wolf You Feed against Sakana Fugu, Claude, Gemini, and GPT using calibrated cross-family model judges and a no-self-grading rule.

Headline Results

0.929WYF adjusted Moral Alignment Score
0.020WYF Severe Violation Rate
66Single-turn moral dilemmas

Systems Tested

The public contestant pool was WYF v0.1.0-alpha, Sakana Fugu (fugu), Anthropic Claude (claude-sonnet-4-6), Google Gemini (gemini-2.5-pro), and OpenAI GPT (gpt-5.4). The judge panel used NVIDIA-hosted Llama 3.3 70B, xAI Grok 4.3, and Mistral Large so the judges remained disjoint from the public contestant pool.

The archived preprint is available on Zenodo, and the benchmark discussion is available on the Wolf You Feed blog.

Citation

Bradley, S. on behalf of Wolf You Feed Research. (2026). A Constitutional Alignment Benchmark for Decision-Support Agents: Measuring Whether an AI Keeps the User as the Principal (Preprint v1.1). Zenodo. https://doi.org/10.5281/zenodo.21159497