Sensie for AI / RLHF — Verify the human, not the label.

Why output-only QA fails

A disengaged human produces labels that look exactly like engaged ones. A human with a frontier model open in the next tab produces labels that are, by construction, indistinguishable from human judgment. Output review cannot catch this — the output is the disguise. The only place to verify quality is upstream of the label, in the human, at the moment of judgment.

How it's different

Vs attention checks: attention checks inspect the label and are gameable. Sensie inspects the human and cannot be defeated by going through the motions.

Vs inter-rater agreement: IAA is retrospective and can't tell you why a rater was off. Sensie is per-session and prospective.

Vs LLM-as-judge: model graders compare outputs to outputs. At the bottom of the eval stack is a human whose judgment is ground truth — Sensie verifies that human gave it.

What partners get

Per-rater, per-session alignment score
Pre-registered baseline study on your domain
Open eval harness for reproducibility: github.com/sensie-app/sensie-eval-harness

Evidence

9 PhD-led research trials · 18,000+ sessions · 83.6% post-calibration accuracy · 2 granted US patents + 1 filing.

Request a Pilot

Request a Pilot · Book intro call · mike@joinsensie.com