Skip to content

Eval Files

Evaluation files define the test cases, targets, and evaluators for an evaluation run. AgentV supports two formats: YAML and JSONL.

The primary format. A single file contains metadata, execution config, and tests:

description: Math problem solving evaluation
execution:
target: default
evaluators:
- name: correctness
type: llm_judge
prompt: ./judges/correctness.md
tests:
- id: addition
criteria: Correctly calculates 15 + 27 = 42
input: What is 15 + 27?
expected_output: "42"
FieldDescription
descriptionHuman-readable description of the evaluation
datasetOptional dataset identifier
executionDefault execution config (target, evaluators)
workspaceSuite-level workspace config (setup/teardown scripts, template)
testsArray of individual tests

For large-scale evaluations, AgentV supports JSONL (JSON Lines) format. Each line is a single test:

{"id": "test-1", "criteria": "Calculates correctly", "input": "What is 2+2?"}
{"id": "test-2", "criteria": "Provides explanation", "input": "Explain variables"}

An optional YAML sidecar file provides metadata and execution config. Place it alongside the JSONL file with the same base name:

dataset.jsonl + dataset.yaml:

description: Math evaluation dataset
dataset: math-tests
execution:
target: azure_base
evaluator: llm_judge
  • Streaming-friendly — process line by line
  • Git-friendly — diffs show individual case changes
  • Programmatic generation — easy to create from scripts
  • Industry standard — compatible with DeepEval, LangWatch, Hugging Face datasets

Use the convert command to switch between YAML and JSONL:

Terminal window
agentv convert evals/dataset.yaml --format jsonl
agentv convert evals/dataset.jsonl --format yaml