Skip to content

createJudge

createJudge<TOptions>(name, assess): Judge<TOptions>

Creates a named judge object from an assessment function.

TOptions extends JudgeContext<any, any, any, any>

string

Stable judge name shown in assertion messages and reports.

JudgeAssessFn<TOptions>

Function that scores one normalized judge context.

Judge<TOptions>

import { createJudge, type JudgeContext } from "vitest-evals";
type RefundOutput = { status: "approved" | "denied" };
type RefundMetadata = { expected: { status: RefundOutput["status"] } };
export const RefundStatusJudge = createJudge(
"RefundStatusJudge",
async ({ output, metadata }: JudgeContext<string, RefundOutput, RefundMetadata>) => ({
score: output.status === metadata.expected.status ? 1 : 0,
metadata: {
rationale: `Expected ${metadata.expected.status}, got ${output.status}`,
},
}),
);

For LLM-backed judges, prefer the object form with ctx.runJudge(...) so provider-specific model configuration stays in the judge harness.

createJudge<TOptions>(config): Judge<TOptions>

Creates a named judge object from an assessment function.

TOptions extends JudgeContext<any, any, any, any>

CreateJudgeConfig<TOptions>

Judge<TOptions>

import { createJudge, type JudgeContext } from "vitest-evals";
type RefundOutput = { status: "approved" | "denied" };
type RefundMetadata = { expected: { status: RefundOutput["status"] } };
export const RefundStatusJudge = createJudge(
"RefundStatusJudge",
async ({ output, metadata }: JudgeContext<string, RefundOutput, RefundMetadata>) => ({
score: output.status === metadata.expected.status ? 1 : 0,
metadata: {
rationale: `Expected ${metadata.expected.status}, got ${output.status}`,
},
}),
);

For LLM-backed judges, prefer the object form with ctx.runJudge(...) so provider-specific model configuration stays in the judge harness.

createJudge<TOptions, TInput, TOutput>(name, assessor, assess): Judge<TOptions>

TOptions extends JudgeContext<any, any, any, any>

TInput

TOutput

string

JudgeAssessor<TInput, TOutput>

JudgeAssessWithAssessorFn<TOptions, TInput, TOutput>

Judge<TOptions>