Skip to content

Overview

Start by choosing the harness that matches the runtime your app already uses. Every harness page follows the same small story: an app answers “What is the capital of France?”, the harness normalizes that run, and a judge checks whether the output names Paris.

Keep evals on their own command and Vitest config. The separate config keeps longer provider timeouts, eval-only includes, reporter setup, and replay defaults out of unit tests.

package.json
{
"scripts": {
"evals": "vitest run --config vitest.evals.config.ts",
"evals:record": "VITEST_EVALS_REPLAY_MODE=record vitest run --config vitest.evals.config.ts"
}
}
vitest.evals.config.ts
import { defineConfig } from "vitest/config";
export default defineConfig({
test: {
include: ["evals/**/*.eval.ts"],
testTimeout: 30_000,
hookTimeout: 30_000,
reporters: ["vitest-evals/reporter"],
env: {
VITEST_EVALS_REPLAY_MODE:
process.env.VITEST_EVALS_REPLAY_MODE ?? "auto",
VITEST_EVALS_REPLAY_DIR: ".vitest-evals/recordings",
},
},
});

The harness pages intentionally use the same question, output, and judge. That keeps the documentation focused on the adapter boundary:

StepWhat changes by runtime
App ShapeThe production agent, workflow, or service function.
Configure HarnessThe adapter package and runtime options.
EvalThe run("What is the capital of France?") call and CapitalJudge.