JudgeContext
Full normalized context passed to every judge.
Scenario-owned judge criteria should live on input. Use metadata for
per-run expectations or harness configuration that are not part of the
scenario payload.
Example
Section titled “Example”type RefundContext = JudgeContext< string, { status: "approved" | "denied" }, { expected: { status: "approved" | "denied" } }>;
const RefundStatusJudge = createJudge( "RefundStatusJudge", ({ output, metadata }: RefundContext) => ({ score: output.status === metadata.expected.status ? 1 : 0, }),);Extended by
Section titled “Extended by”Type Parameters
Section titled “Type Parameters”TInput
Section titled “TInput”TInput = unknown
TOutput
Section titled “TOutput”TOutput extends JsonValue | undefined = JsonValue | undefined
TMetadata
Section titled “TMetadata”TMetadata extends HarnessMetadata = HarnessMetadata
THarness
Section titled “THarness”THarness extends Harness<TInput, TOutput, TMetadata> | undefined = Harness<TInput, TOutput, TMetadata> | undefined
Properties
Section titled “Properties”harness
Section titled “harness”harness:
THarness
Harness associated with this judge context.
input:
TInput
Original eval input passed to the harness.
metadata
Section titled “metadata”metadata:
Readonly<TMetadata>
Per-run expectations or configuration passed to run(input, { metadata }).
output
Section titled “output”output:
TOutput
App-facing output returned by the harness.
run:
HarnessRun<TOutput>
Complete normalized harness run being judged.
runJudge?
Section titled “runJudge?”
optionalrunJudge?:RunJudge
Runs the configured matcher, judge, or suite judge harness with run-scoped context.
session
Section titled “session”session:
HarnessRun<TOutput>["session"]
Normalized transcript associated with the harness run.
toolCalls
Section titled “toolCalls”toolCalls:
ToolCallRecord[]
Flattened tool calls observed in the normalized session.