Skip to content

Pi Harness

Use the Pi harness when your app exposes a Pi agent, toolset, or runtime-compatible run(input, runtime) entrypoint. The same Paris example applies here: the app answers the question, the harness normalizes the run, and the judge scores whether the output names Paris.

Pi
pnpm add -D vitest-evals @vitest-evals/harness-pi-ai

The agent factory is the app boundary. It can close over prompts, metadata, and services, then expose the runtime-compatible run(input, runtime) shape the harness expects.

src/questionAgent.ts
type PiRuntime = {
events?: {
message(message: { role: "assistant"; content: string }): void;
};
};
export function createQuestionAgent({
instructions,
metadata,
}: {
instructions: string;
metadata: Record<string, unknown>;
}) {
return {
async run(input: string, runtime: PiRuntime) {
const output = await answerQuestion(input, { instructions, metadata });
runtime.events?.message({
role: "assistant",
content: output,
});
return {
output,
};
},
};
}

The harness creates the agent for each run and forwards per-case metadata from the eval into the app boundary.

evals/qaHarness.ts
import { piAiHarness } from "@vitest-evals/harness-pi-ai";
import { createQuestionAgent } from "../src/questionAgent";
export const qaHarness = piAiHarness({
agent: ({ input, context }) =>
createQuestionAgent({
instructions: buildInstructions(input),
metadata: context.metadata,
}),
});

Return { output } from the Pi entrypoint when tests and judges should assert on a parsed domain value.

Write the eval against the harness, not the Pi runtime directly. The judge reads the normalized output, so the scoring code stays independent of Pi internals.

evals/capital.eval.ts
import { expect } from "vitest";
import {
createJudge,
describeEval,
type JudgeContext,
} from "vitest-evals";
import { qaHarness } from "./qaHarness";
const CapitalJudge = createJudge(
"CapitalJudge",
async ({ output }: JudgeContext<string, string>) => {
const passed = output.toLowerCase().includes("paris");
return {
score: passed ? 1 : 0,
metadata: {
rationale: passed
? "The answer names Paris."
: `Expected Paris, got: ${output}`,
},
};
},
);
describeEval("capital questions", { harness: qaHarness }, (it) => {
it("knows the capital of France", async ({ run }) => {
const result = await run("What is the capital of France?");
expect(result.output).toContain("Paris");
await expect(result).toSatisfyJudge(CapitalJudge);
});
});