Overview

Start by choosing the harness that matches the runtime your app already uses. Every harness page follows the same small story: an app answers “What is the capital of France?”, the harness normalizes that run, and a judge checks whether the output names Paris.

Choose a Harness

AI SDKUse when your app calls generateText, streamText, or an AI SDK wrapper.OpenAI AgentsUse when your app owns an Agent and runs it with a Runner.PiUse when your app exposes a Pi agent, toolset, or runtime-compatible entrypoint.Custom HarnessesUse for workflows, service functions, CLIs, RAG pipelines, and custom agents.

Configure Vitest

Keep evals on their own command and Vitest config. The separate config keeps longer provider timeouts, eval-only includes, reporter setup, and replay defaults out of unit tests.

{
  "scripts": {
    "evals": "vitest run --config vitest.evals.config.ts",
    "evals:record": "VITEST_EVALS_REPLAY_MODE=record vitest run --config vitest.evals.config.ts"
  }
}

import { defineConfig } from "vitest/config";

export default defineConfig({
  test: {
    include: ["evals/**/*.eval.ts"],
    testTimeout: 30_000,
    hookTimeout: 30_000,
    reporters: ["vitest-evals/reporter"],
    env: {
      VITEST_EVALS_REPLAY_MODE:
        process.env.VITEST_EVALS_REPLAY_MODE ?? "auto",
      VITEST_EVALS_REPLAY_DIR: ".vitest-evals/recordings",
    },
  },
});

Follow the Story

The harness pages intentionally use the same question, output, and judge. That keeps the documentation focused on the adapter boundary:

Step	What changes by runtime
App Shape	The production agent, workflow, or service function.
Configure Harness	The adapter package and runtime options.
Eval	The `run("What is the capital of France?")` call and `CapitalJudge`.

HarnessesCompare the supported runtime adapters before choosing one.JudgesUse built-in judges, write custom judges, and set thresholds.Tool ReplayRecord deterministic tool calls without hiding model behavior.GitHub ReportingPublish eval summaries and checks from workflow JSON output.

Overview

Choose a Harness

Configure Vitest

Follow the Story

Next