Overview
Start by choosing the harness that matches the runtime your app already uses. Every harness page follows the same small story: an app answers “What is the capital of France?”, the harness normalizes that run, and a judge checks whether the output names Paris.
Choose a Harness
Section titled “Choose a Harness”AI SDKUse when your app calls
generateText, streamText, or an AI SDK wrapper.OpenAI AgentsUse when your app owns an Agent and runs it with a Runner.PiUse when your app exposes a Pi agent, toolset, or runtime-compatible entrypoint.Custom HarnessesUse for workflows, service functions, CLIs, RAG pipelines, and custom agents.Configure Vitest
Section titled “Configure Vitest”Keep evals on their own command and Vitest config. The separate config keeps longer provider timeouts, eval-only includes, reporter setup, and replay defaults out of unit tests.
{ "scripts": { "evals": "vitest run --config vitest.evals.config.ts", "evals:record": "VITEST_EVALS_REPLAY_MODE=record vitest run --config vitest.evals.config.ts" }}import { defineConfig } from "vitest/config";
export default defineConfig({ test: { include: ["evals/**/*.eval.ts"], testTimeout: 30_000, hookTimeout: 30_000, reporters: ["vitest-evals/reporter"], env: { VITEST_EVALS_REPLAY_MODE: process.env.VITEST_EVALS_REPLAY_MODE ?? "auto", VITEST_EVALS_REPLAY_DIR: ".vitest-evals/recordings", }, },});Follow the Story
Section titled “Follow the Story”The harness pages intentionally use the same question, output, and judge. That keeps the documentation focused on the adapter boundary:
| Step | What changes by runtime |
|---|---|
| App Shape | The production agent, workflow, or service function. |
| Configure Harness | The adapter package and runtime options. |
| Eval | The run("What is the capital of France?") call and CapitalJudge. |