Fine-Tuning JSONL Validator

Fine-Tuning JSONL Validator

Line-by-line schema, role-ordering, token-count, and duplicate detection for OpenAI and Anthropic SFT data.

Rows
3
Valid
3
Errors
0
Warnings
0
Duplicates
0
Total tokens
101
Avg tokens / row
34
Min / Max tokens
22 / 40
LineRolesTokensIssues
L1
systemuserassistant
39ok
L2
systemuserassistant
40ok
L3
userassistant
22ok

What This Tool Does

Fine-Tuning JSONL Validator is built for deterministic developer and agent workflows.

Validate OpenAI and Anthropic supervised fine-tuning JSONL line-by-line: schema, role ordering, token counts, message-format violations, and duplicate detection.

Use How to Use for execution steps and FAQ for constraints, policies, and edge cases.

Last updated:

This tool is provided as-is for convenience. Output should be verified before use in any production or critical context.

Agent Invocation

Best Path For Builders

Browser workflow

Runs instantly in the browser with private local processing and copy/export-ready output.

Browser Workflow

This tool is optimized for instant in-browser execution with local data handling. Run it here and copy/export the output directly.

/fine-tuning-jsonl-validator/

For automation planning, fetch the canonical contract at /api/tool/fine-tuning-jsonl-validator.json.

How to Use Fine-Tuning JSONL Validator

  1. 1

    Pick the target provider

    Switch between OpenAI and Anthropic. The validator applies provider-specific schema rules — Anthropic puts system prompts at the top level while OpenAI inlines them as the first message.

  2. 2

    Paste your JSONL

    Drop one record per line into the input area. Empty lines are skipped. Use the Load sample button to see the expected structure for the selected provider.

  3. 3

    Review the per-row report

    Each row shows its detected roles, token count from gpt-tokenizer, and any schema or ordering issues. Filter the table by errors or warnings to focus on problem rows.

  4. 4

    Resolve duplicates and shape issues

    The summary panel surfaces duplicate groups, token distribution, and total error and warning counts. Fix consecutive-role warnings and missing assistant turns before kicking off a fine-tune job.

Frequently Asked Questions

Which providers are supported?
OpenAI Chat Completions fine-tune format and Anthropic Messages fine-tune format. Each provider has its own schema rules — system at top level for Anthropic, inline system for OpenAI — and the validator switches rule sets accordingly.
How are token counts computed?
Every row is encoded with gpt-tokenizer, the same BPE library used for GPT-4 family models. Counts are a close approximation for OpenAI fine-tunes and a reasonable upper bound for Anthropic; treat them as guidance, not a billing oracle.
How does duplicate detection work?
Each row is hashed against its full normalized JSON. Rows with identical hashes are flagged as duplicates of the first occurrence so you can prune them before training.
Does it send my data to a server?
No. Parsing, schema checks, tokenization, and hashing all run in your browser. The JSONL never leaves your device.
Can it catch role-ordering bugs?
Yes. The validator flags consecutive user messages, consecutive assistant messages, missing assistant targets, system messages out of position, and Anthropic rows that start with assistant instead of user.