Skip to content

AI SDK Harness

Use the AI SDK harness when your app already builds prompts, models, and tools with the Vercel AI SDK. This page follows one small end-to-end story: a geography assistant answers “What is the capital of France?”, the harness turns that AI SDK result into eval output, and a judge checks that the answer names Paris.

AI SDK
pnpm add -D vitest-evals @vitest-evals/harness-ai-sdk

Keep the agent factory in app code. The harness should call the same model, prompt, tools, and parser that production uses.

src/questionAgent.ts
import { openai } from "@ai-sdk/openai";
export function createQuestionAgent() {
return {
model: openai("gpt-4o-mini"),
system:
"Answer geography questions directly. Keep answers short.",
};
}
export function textAnswer(text: string): string {
return text.trim();
}

createQuestionAgent() owns the runtime setup. textAnswer() is the small app parser that turns provider text into the value tests and judges should read.

The harness wraps the app shape and defines the normalized output. In this case the normalized output is just the trimmed answer text.

evals/qaHarness.ts
import { aiSdkHarness } from "@vitest-evals/harness-ai-sdk";
import { createQuestionAgent, textAnswer } from "../src/questionAgent";
export const qaHarness = aiSdkHarness({
agent: () => createQuestionAgent(),
output: ({ result }) => textAnswer(result.text),
});

Use output for the value your evals care about. Messages, tool calls, usage, and errors remain available on the normalized harness run for judges, replay, and reports.

Call run(input) once, then judge that same normalized result. The judge is deliberately simple so the harness story stays clear.

evals/capital.eval.ts
import { expect } from "vitest";
import {
createJudge,
describeEval,
type JudgeContext,
} from "vitest-evals";
import { qaHarness } from "./qaHarness";
const CapitalJudge = createJudge(
"CapitalJudge",
async ({ output }: JudgeContext<string, string>) => {
const passed = output.toLowerCase().includes("paris");
return {
score: passed ? 1 : 0,
metadata: {
rationale: passed
? "The answer names Paris."
: `Expected Paris, got: ${output}`,
},
};
},
);
describeEval("capital questions", { harness: qaHarness }, (it) => {
it("knows the capital of France", async ({ run }) => {
const result = await run("What is the capital of France?");
expect(result.output).toContain("Paris");
await expect(result).toSatisfyJudge(CapitalJudge);
});
});