Disagreeable AI logo

Program

Research Roadmap

The product goal is not generic contrarianism. It is disciplined critical dialogue backed by measurable work on sycophancy resistance, pragmatic interpretation, and epistemic state estimation.

Back to dashboard

Anti-sycophancy

Measure whether the assistant caves to pressure, fake authority, status games, or confidence theater. The product should reward justified revision, not social deference.

Pragmatics and stance

Track hedging, commitment strength, implied assumptions, and whether the system responds to the literal claim or the actual conversational move underneath it.

Epistemic state estimation

Estimate what the user appears to believe, how strongly they hold it, where they are overconfident, and which counterevidence would realistically shift them.

Structured memory and preference modeling

Use accept, dismiss, and counter signals to distinguish real disagreement from style preference, then build a profile of recurring blind spots without turning the assistant into a people-pleaser.

Execution plan

Build the product and the research stack in an order that preserves speed, signal quality, and margin discipline.

Phase 1

Reliable critical chat

  • Fast visible chat that answers normally while staying critical by default.
  • Structured extraction for faults, ideas, assumptions, and arguments in the background.
  • Per-turn token and cost tracking for visible chat versus hidden analysis.

Phase 2

Interactive critique loops

  • Counters, dismissals, acceptances, and deeper drill-down on specific faults.
  • Conversation-local versus cross-conversation fault separation.
  • Worker-generated recurring patterns backed by explicit evidence across sessions.

Phase 3

Research-grade instrumentation

  • Episode-level annotation for pressure, bluffing, revision quality, and confidence calibration.
  • Evaluation harnesses for sycophancy, epistemic discipline, and pragmatic interpretation.
  • Consent-gated research datasets with better provenance and auditability.

Phase 4

Model specialization

  • Use the accumulated labeled data to compare prompting against fine-tuning or LoRA adaptation.
  • Train a smaller dedicated critique model only after prompt and evaluation baselines are stable.
  • Keep profitability visible by tying every training and inference decision back to observed margins.

Data and evaluation rules

Every accept, dismiss, and counter action is a potentially useful label, but it should not be treated as ground truth. We need to distinguish preference, defensiveness, misunderstanding, and genuine rebuttal quality.

Research use must remain consent-gated. Product adaptation can continue without contaminating the research corpus, provided provenance is explicit and each record carries enough context to reconstruct what the model was asked to do.

Evaluation should focus on calibration, justified revision, resistance to fake authority, and whether the assistant can track the user's epistemic posture without collapsing into flattery or mechanical opposition.

Near-term product priorities

  1. 1. Keep visible chat fast, natural, and critical.
  2. 2. Reduce hidden-analysis cost before widening premium quotas.
  3. 3. Separate conversation-local faults from persistent user patterns.
  4. 4. Add better evidence-backed counters and rebuttal adjudication.
  5. 5. Only then move toward fine-tuning experiments.