StructuredOutputJudge

StructuredOutputJudge() compares the harness output with expected values from metadata or matcher options. Use it for deterministic checks when the harness returns a parsed domain object instead of raw provider text.

Usage

import { StructuredOutputJudge } from "vitest-evals";

describeEval("capital questions", {
  harness: structuredQaHarness,
  judges: [
    StructuredOutputJudge(),
  ],
});

Explicit Assertion

Use explicit assertions when a case needs an extra check or when you want to record a score separately from the suite defaults.

it("records a structured output score", async ({ run }) => {
  const result = await run("What is the capital of France?", {
    metadata: {
      expected: { answer: "Paris", country: "France" },
    },
  });

  await expect(result).toSatisfyJudge(StructuredOutputJudge, {
    threshold: null,
  });
});

threshold: null records the score without turning it into a failure.

Expected Output

Put the expected value in run metadata.

const result = await run("What is the capital of France?", {
  metadata: {
    expected: {
      answer: "Paris",
      country: "France",
    },
  },
});

Structured output checks work best when the harness returns a domain object instead of raw provider text.

Failure Behavior

The judge fails when the configured expected fields do not match the harness output and the score falls below the active threshold. Keep harness output as a domain object so reports can show field-level differences instead of raw text comparisons.