AI SDK Harness
Use the AI SDK harness when your app already builds prompts, models, and tools with the Vercel AI SDK. This page follows one small end-to-end story: a geography assistant answers “What is the capital of France?”, the harness turns that AI SDK result into eval output, and a judge checks that the answer names Paris.
Install
Section titled “Install”pnpm add -D vitest-evals @vitest-evals/harness-ai-sdkApp Shape
Section titled “App Shape”Keep the agent factory in app code. The harness should call the same model, prompt, tools, and parser that production uses.
import { openai } from "@ai-sdk/openai";
export function createQuestionAgent() { return { model: openai("gpt-4o-mini"), system: "Answer geography questions directly. Keep answers short.", };}
export function textAnswer(text: string): string { return text.trim();}createQuestionAgent() owns the runtime setup. textAnswer() is the small app
parser that turns provider text into the value tests and judges should read.
Configure Harness
Section titled “Configure Harness”The harness wraps the app shape and defines the normalized output. In this
case the normalized output is just the trimmed answer text.
import { aiSdkHarness } from "@vitest-evals/harness-ai-sdk";import { createQuestionAgent, textAnswer } from "../src/questionAgent";
export const qaHarness = aiSdkHarness({ agent: () => createQuestionAgent(), output: ({ result }) => textAnswer(result.text),});Use output for the value your evals care about. Messages, tool calls, usage,
and errors remain available on the normalized harness run for judges, replay,
and reports.
Call run(input) once, then judge that same normalized result. The judge is
deliberately simple so the harness story stays clear.
import { expect } from "vitest";import { createJudge, describeEval, type JudgeContext,} from "vitest-evals";import { qaHarness } from "./qaHarness";
const CapitalJudge = createJudge( "CapitalJudge", async ({ output }: JudgeContext<string, string>) => { const passed = output.toLowerCase().includes("paris");
return { score: passed ? 1 : 0, metadata: { rationale: passed ? "The answer names Paris." : `Expected Paris, got: ${output}`, }, }; },);
describeEval("capital questions", { harness: qaHarness }, (it) => { it("knows the capital of France", async ({ run }) => { const result = await run("What is the capital of France?");
expect(result.output).toContain("Paris"); await expect(result).toSatisfyJudge(CapitalJudge); });});