Fine-Tuning JSONL Validator

Line-by-line schema, role-ordering, token-count, and duplicate detection for OpenAI and Anthropic SFT data.

JSONL input (3 rows)

Summary

Rows

Valid

Errors

Warnings

Duplicates

Total tokens

104

Avg tokens / row

Min / Max tokens

23 / 43

Line	Roles	Tokens	Issues
L1	systemuserassistant	38	ok
L2	systemuserassistant	43	ok
L3	userassistant	23	ok

What This Tool Does

Fine-Tuning JSONL Validator is built for deterministic developer and agent workflows.

Validate OpenAI and Anthropic supervised fine-tuning JSONL line-by-line: schema, role ordering, token counts, message-format violations, and duplicate detection.

Use How to Use for execution steps and FAQ for constraints, policies, and edge cases.

Last updated: July 19, 2026

This tool is provided as-is for convenience. Output should be verified before use in any production or critical context.

Agent Invocation

Best Path For Builders

Browser workflow

Runs instantly in the browser with private local processing and copy/export-ready output.

Browser Workflow

This tool is optimized for instant in-browser execution with local data handling. Run it here and copy/export the output directly.

/fine-tuning-jsonl-validator/

For automation planning, fetch the canonical contract at /api/tool/fine-tuning-jsonl-validator.json.

How to Use Fine-Tuning JSONL Validator

1

Pick the target provider

Switch between OpenAI and Anthropic. The validator applies provider-specific schema rules — Anthropic puts system prompts at the top level while OpenAI inlines them as the first message.
2

Paste your JSONL

Drop one record per line into the input area. Empty lines are skipped. Use the Load sample button to see the expected structure for the selected provider.
3

Review the per-row report

Each row shows its detected roles, token count from gpt-tokenizer, and any schema or ordering issues. Filter the table by errors or warnings to focus on problem rows.
4

Resolve duplicates and shape issues

The summary panel surfaces duplicate groups, token distribution, and total error and warning counts. Fix consecutive-role warnings and missing assistant turns before kicking off a fine-tune job.

Frequently Asked Questions

Which providers are supported?

OpenAI Chat Completions fine-tune format and Anthropic Messages fine-tune format. Each provider has its own schema rules — system at top level for Anthropic, inline system for OpenAI — and the validator switches rule sets accordingly.

How are token counts computed?

Every row is encoded with gpt-tokenizer, the same BPE library used for GPT-4 family models. Counts are a close approximation for OpenAI fine-tunes and a reasonable upper bound for Anthropic; treat them as guidance, not a billing oracle.

How does duplicate detection work?

Each row is hashed against its full normalized JSON. Rows with identical hashes are flagged as duplicates of the first occurrence so you can prune them before training.

Does it send my data to a server?

No. Parsing, schema checks, tokenization, and hashing all run in your browser. The JSONL never leaves your device.

Can it catch role-ordering bugs?

Yes. The validator flags consecutive user messages, consecutive assistant messages, missing assistant targets, system messages out of position, and Anthropic rows that start with assistant instead of user.

Fine-Tuning JSONL Validator

Fine-Tuning JSONL Validator

What This Tool Does

Agent Invocation

How to Use Fine-Tuning JSONL Validator

Pick the target provider

Paste your JSONL

Review the per-row report

Resolve duplicates and shape issues

Frequently Asked Questions