Current package: 0.10.0

AI evals that feel like Vitest.

Pick the harness for your runtime, run your real agent or app once, and score the normalized result with ordinary Vitest assertions and judges.

Documentation Choose a Harness API Reference

PASS apps/qa/evals/capital.eval.ts

✓ knows the capital of France 142 tok · 1 judge · 1.8s

CapitalJudge 1.00

expected Paris

{ "output": "Paris is the capital of France." }

Choose a Harness

Start with the adapter that matches the code you already run in production. Each harness page uses the same Paris example so the runtime differences are easy to compare.

AI SDK

For apps built around generateText, streamText, or an AI SDK wrapper.

OpenAI Agents

For apps that own an Agent and run it with a Runner.

Pi

For Pi agents, toolsets, or runtime-compatible entrypoints.

Custom Harnesses

For workflows, service functions, CLIs, RAG pipelines, and custom agents.

The Core Loop

Shape the app

Keep the production agent, service, or workflow in app code.

Configure a harness

The harness calls that app and normalizes output, session, usage, and artifacts.

Judge the same run

Built-in and custom judges read output, session, tool calls, usage, and metadata without re-running the app.

Example

capital.eval.ts

import { expect } from "vitest";
import {
  createJudge,
  describeEval,
  type JudgeContext,
} from "vitest-evals";
import { qaHarness } from "./qaHarness";

const CapitalJudge = createJudge(
  "CapitalJudge",
  async ({ output }: JudgeContext<string, string>) => ({
    score: output.toLowerCase().includes("paris") ? 1 : 0,
  }),
);

describeEval("capital questions", { harness: qaHarness }, (it) => {
  it("knows the capital of France", async ({ run }) => {
    const result = await run("What is the capital of France?");

    expect(result.output).toContain("Paris");
    await expect(result).toSatisfyJudge(CapitalJudge);
  });
});

GitHub Actions

The action reads Vitest JSON reports and publishes summaries, workflow annotations, and optional Check Runs.

.github/workflows/evals.yml

- name: Run evals
  run: |
    pnpm exec vitest run --config vitest.evals.config.ts \
      --reporter=vitest-evals/reporter \
      --reporter=json \
      --outputFile.json=vitest-results.json

- uses: getsentry/vitest-evals@v0
  if: always()
  with:
    results: vitest-results.json