Pi Harness
Use the Pi harness when your app exposes a Pi agent, toolset, or
runtime-compatible run(input, runtime) entrypoint. The same Paris example
applies here: the app answers the question, the harness normalizes the run, and
the judge scores whether the output names Paris.
Install
Section titled “Install”pnpm add -D vitest-evals @vitest-evals/harness-pi-aiApp Shape
Section titled “App Shape”The agent factory is the app boundary. It can close over prompts, metadata, and
services, then expose the runtime-compatible run(input, runtime) shape the
harness expects.
type PiRuntime = { events?: { message(message: { role: "assistant"; content: string }): void; };};
export function createQuestionAgent({ instructions, metadata,}: { instructions: string; metadata: Record<string, unknown>;}) { return { async run(input: string, runtime: PiRuntime) { const output = await answerQuestion(input, { instructions, metadata });
runtime.events?.message({ role: "assistant", content: output, });
return { output, }; }, };}Configure Harness
Section titled “Configure Harness”The harness creates the agent for each run and forwards per-case metadata from the eval into the app boundary.
import { piAiHarness } from "@vitest-evals/harness-pi-ai";import { createQuestionAgent } from "../src/questionAgent";
export const qaHarness = piAiHarness({ agent: ({ input, context }) => createQuestionAgent({ instructions: buildInstructions(input), metadata: context.metadata, }),});Return { output } from the Pi entrypoint when tests and judges should assert
on a parsed domain value.
Write the eval against the harness, not the Pi runtime directly. The judge reads the normalized output, so the scoring code stays independent of Pi internals.
import { expect } from "vitest";import { createJudge, describeEval, type JudgeContext,} from "vitest-evals";import { qaHarness } from "./qaHarness";
const CapitalJudge = createJudge( "CapitalJudge", async ({ output }: JudgeContext<string, string>) => { const passed = output.toLowerCase().includes("paris");
return { score: passed ? 1 : 0, metadata: { rationale: passed ? "The answer names Paris." : `Expected Paris, got: ${output}`, }, }; },);
describeEval("capital questions", { harness: qaHarness }, (it) => { it("knows the capital of France", async ({ run }) => { const result = await run("What is the capital of France?");
expect(result.output).toContain("Paris"); await expect(result).toSatisfyJudge(CapitalJudge); });});