Skip to content

StructuredOutputJudge

StructuredOutputJudge() compares the harness output with expected values from metadata or matcher options. Use it for deterministic checks when the harness returns a parsed domain object instead of raw provider text.

evals/capital.eval.ts
import { StructuredOutputJudge } from "vitest-evals";
describeEval("capital questions", {
harness: structuredQaHarness,
judges: [
StructuredOutputJudge(),
],
});

Use explicit assertions when a case needs an extra check or when you want to record a score separately from the suite defaults.

evals/capital.eval.ts
it("records a structured output score", async ({ run }) => {
const result = await run("What is the capital of France?", {
metadata: {
expected: { answer: "Paris", country: "France" },
},
});
await expect(result).toSatisfyJudge(StructuredOutputJudge, {
threshold: null,
});
});

threshold: null records the score without turning it into a failure.

Put the expected value in run metadata.

evals/capital.eval.ts
const result = await run("What is the capital of France?", {
metadata: {
expected: {
answer: "Paris",
country: "France",
},
},
});

Structured output checks work best when the harness returns a domain object instead of raw provider text.

The judge fails when the configured expected fields do not match the harness output and the score falls below the active threshold. Keep harness output as a domain object so reports can show field-level differences instead of raw text comparisons.