JSON Driver Masterclass
What is a Driver?
Section titled “What is a Driver?”A driver is a JSON file that tells Onset Engine what visual content to assign to each energy tier of your music. Without a driver, the engine uses raw CLIP similarity and motion scores. With a driver, you get precise creative control over what appears at each musical intensity level.
Think of it as a content brief for the AI: “During quiet sections, show landscapes. During medium energy, show dialogue. During the drop, show explosions.”
The v3 Schema
Section titled “The v3 Schema”{ "meta": { "name": "Action Anime Driver", "version": "3.0", "description": "Maps anime content to energy tiers with subject awareness" }, "global": { "min_rating": 3, "exclude_tags": ["@Filler", "@Recap"], "shot_diversity": true }, "tiers": { "1_LOW": { "descriptions": [ "character standing in peaceful landscape", "calm sky with clouds", "characters talking quietly" ], "subjects": ["@Goku"], "moods": ["serene", "melancholic"], "scene_types": ["wide", "medium"] }, "2_MED": { "descriptions": [ "character powering up with glowing aura", "intense stare between fighters", "character flying through sky" ], "subjects": ["@Goku", "@Vegeta"], "moods": ["tense"], "scene_types": ["medium", "close-up"] }, "3_HIGH": { "descriptions": [ "fast martial arts combat with punching", "energy beam attack", "character dodging rapid attacks" ], "subjects": ["@Goku"], "moods": ["aggressive", "epic"], "scene_types": ["close-up", "medium"] }, "4_MAX": { "descriptions": [ "massive energy explosion", "character transforming with blinding light", "devastating beam clash" ], "subjects": ["@Goku"], "moods": ["epic"], "min_rating": 4 } }}Schema Reference
Section titled “Schema Reference”Top-Level Blocks
Section titled “Top-Level Blocks”| Block | Description |
|---|---|
meta | Display name, version, and description |
global | Library-wide filtering rules applied to all tiers |
tiers | Energy tier definitions keyed as 1_LOW, 2_MED, 3_HIGH, 4_MAX |
Meta Fields
Section titled “Meta Fields”| Field | Type | Description |
|---|---|---|
name | string | Display name for the driver |
version | string | Schema version (use "3.0" for current) |
description | string | What this driver is designed for |
Global Filters
Section titled “Global Filters”| Field | Type | Default | Description |
|---|---|---|---|
min_rating | integer | 0 | Minimum quality rating for all tiers (0–5) |
exclude_tags | string[] | [] | Exclude clips with these tags from all tiers |
shot_diversity | boolean | true | Enable diversity filtering to prevent visual repetition |
Tier Fields
Section titled “Tier Fields”Each tier is keyed as 1_LOW, 2_MED, 3_HIGH, or 4_MAX:
| Field | Type | Description |
|---|---|---|
descriptions | string[] | CLIP text descriptions — the engine computes cosine similarity against clip embeddings |
subjects | string[] | Subject tag references using @TagName syntax |
moods | string[] | Filter to clips with matching mood classification |
scene_types | string[] | Filter to clips with matching scene type |
min_rating | integer | Per-tier minimum rating override (overrides global) |
Energy Tiers
Section titled “Energy Tiers”The engine maps musical energy (0.0–1.0) to four tiers:
| Tier | Energy Range | Musical Moment |
|---|---|---|
1_LOW | 0.0–0.25 | Intros, breakdowns, quiet sections |
2_MED | 0.25–0.50 | Verses, building tension |
3_HIGH | 0.50–0.75 | Choruses, buildups |
4_MAX | 0.75–1.0 | Drops, climaxes, peak energy |
CLIP Descriptions — How Matching Works
Section titled “CLIP Descriptions — How Matching Works”Each description string is encoded into a 768-dim vector using the CLIP text encoder. The engine computes cosine similarity between the description vector and every clip’s embedding.
Description: "massive energy explosion" ↓ CLIP text encoder ↓ 768-dim vector ↓ cosine similarity vs. all clips ↓ ranked results
Clip #4821: cos_sim = 0.31 ← best matchClip #1203: cos_sim = 0.28Clip #0892: cos_sim = 0.24Contrastive Scoring
Section titled “Contrastive Scoring”v3 drivers use contrastive scoring — the engine doesn’t just pick clips with the highest absolute similarity to a tier’s descriptions. It measures tier specificity: how much more similar is this clip to the target tier than to all other tiers?
Standard scoring: score = raw_similarity * 0.4Contrastive: score = raw_similarity * 0.4 + (target_sim - max_other_sim) * 0.6This prevents the common failure mode where the CLIP model’s highest-confidence clips dominate every tier. Contrastive scoring ensures that calm tiers get genuinely calm content, not just the model’s most confident matches.
Subject Tags (@ Syntax)
Section titled “Subject Tags (@ Syntax)”Reference tagged clips using the @TagName syntax in the subjects array:
{ "4_MAX": { "descriptions": ["massive energy explosion"], "subjects": ["@Goku", "@Vegeta"], "moods": ["epic"], "min_rating": 4 }}Tags are created via the few-shot propagation system (tag 5 clips → engine finds 800 more).
Mood and Scene Type Filters
Section titled “Mood and Scene Type Filters”Available Moods
Section titled “Available Moods”| Mood | Description |
|---|---|
epic | Grand, powerful, heroic content |
melancholic | Sad, reflective, emotional |
tense | Suspenseful, high-stakes |
comedic | Light, funny, playful |
romantic | Intimate, warm, affectionate |
serene | Peaceful, calm, meditative |
aggressive | Intense, violent, forceful |
Available Scene Types
Section titled “Available Scene Types”| Scene Type | Description |
|---|---|
close-up | Face or detail shot |
medium | Waist-up or small group |
wide | Full environment or establishing shot |
aerial | Drone or overhead perspective |
pov | First-person or subjective camera |
slow-motion | Reduced playback speed content |
Building Your First Driver
Section titled “Building Your First Driver”Step 1: Identify Your Content
Section titled “Step 1: Identify Your Content”What footage is in your library? Anime fights? Drone landscapes? Wedding ceremonies? The driver should reflect your actual content, not aspirational queries.
Step 2: Write Descriptions
Section titled “Step 2: Write Descriptions”Be specific and visual. CLIP understands natural language:
// ❌ Too vague"descriptions": ["action"]
// ✅ Specific and visual"descriptions": [ "character performing a spinning kick in mid-air", "explosion with debris flying toward camera", "fast sword slash with motion blur"]Step 3: Use the Driver Wizard
Section titled “Step 3: Use the Driver Wizard”In Studio Mode, click ✨ Create in the Clip Direction section to open the Driver Wizard — a visual tier builder with live JSON preview. If you’ve entered text descriptions, the wizard pre-populates from those.
Step 4: Test and Iterate
Section titled “Step 4: Test and Iterate”Use DJ Mode to preview how the driver selects clips in real-time. Adjust descriptions and filters based on what you see. The console shows per-tier diagnostic logging with similarity breakdowns.
Example Drivers
Section titled “Example Drivers”Nature / Drone
Section titled “Nature / Drone”{ "meta": { "name": "Nature Reel", "version": "3.0" }, "tiers": { "1_LOW": { "descriptions": ["calm ocean waves", "forest canopy from above", "sunrise over mountains"], "scene_types": ["wide", "aerial"] }, "2_MED": { "descriptions": ["flowing river through valley", "birds in flight"], "scene_types": ["wide", "medium"] }, "3_HIGH": { "descriptions": ["fast drone dive through canyon", "waterfall close-up"], "scene_types": ["aerial", "pov"], "min_rating": 2 }, "4_MAX": { "descriptions": ["storm clouds time-lapse", "lightning strike over ocean"], "scene_types": ["wide"], "min_rating": 3 } }}Wedding Highlight
Section titled “Wedding Highlight”{ "meta": { "name": "Wedding Highlights", "version": "3.0" }, "global": { "min_rating": 2, "shot_diversity": true }, "tiers": { "1_LOW": { "descriptions": ["wedding venue exterior", "floral decorations", "guests arriving"], "moods": ["serene"], "scene_types": ["wide"] }, "2_MED": { "descriptions": ["bride walking down aisle", "exchanging rings", "emotional guests"], "moods": ["romantic", "melancholic"], "scene_types": ["medium"] }, "3_HIGH": { "descriptions": ["first dance", "wedding party celebration", "champagne toast"], "moods": ["romantic", "epic"], "scene_types": ["medium", "close-up"] }, "4_MAX": { "descriptions": ["crowd dancing at reception", "confetti throw", "sparkler exit"], "moods": ["epic"], "scene_types": ["wide", "close-up"], "min_rating": 3 } }}- Write 3–6 descriptions per tier for best results — more variety = better coverage
- Use
@Tagsubjects only after running clip tagging in the GUI - The penalty multipliers stack: a clip with wrong mood AND scene can get 0.50 × 0.60 = 0.30× score
- Per-tier
min_ratingoverrides the global setting for that tier only