Anti-sycophancy
Measure whether the assistant caves to pressure, fake authority, status games, or confidence theater. The product should reward justified revision, not social deference.
Program
The product goal is not generic contrarianism. It is disciplined critical dialogue backed by measurable work on sycophancy resistance, pragmatic interpretation, and epistemic state estimation.
Measure whether the assistant caves to pressure, fake authority, status games, or confidence theater. The product should reward justified revision, not social deference.
Track hedging, commitment strength, implied assumptions, and whether the system responds to the literal claim or the actual conversational move underneath it.
Estimate what the user appears to believe, how strongly they hold it, where they are overconfident, and which counterevidence would realistically shift them.
Use accept, dismiss, and counter signals to distinguish real disagreement from style preference, then build a profile of recurring blind spots without turning the assistant into a people-pleaser.
Build the product and the research stack in an order that preserves speed, signal quality, and margin discipline.
Phase 1
Phase 2
Phase 3
Phase 4
Every accept, dismiss, and counter action is a potentially useful label, but it should not be treated as ground truth. We need to distinguish preference, defensiveness, misunderstanding, and genuine rebuttal quality.
Research use must remain consent-gated. Product adaptation can continue without contaminating the research corpus, provided provenance is explicit and each record carries enough context to reconstruct what the model was asked to do.
Evaluation should focus on calibration, justified revision, resistance to fake authority, and whether the assistant can track the user's epistemic posture without collapsing into flattery or mechanical opposition.