Home › Schema-first

Schema-first, with a catch-all for the rest

You have two real options, and most teams try to take both, which kills them. Here is the trade-off nobody states plainly, and the call we make instead.

Scope my event types →

Option A - Schema-first

You define every event type upfront, build typed ingestion pipelines, and enforce structure at write time. Retrieval is fast, deterministic, and cheap. The cost is rigidity: every new data type needs a migration, and you will miss important types in month one.

Option B - Unstructured-first

You ingest everything as prose, chunk loosely, and lean on semantic retrieval. It is flexible and quick to bootstrap. The cost arrives later, when quality degrades at scale and you re-engineer the whole index anyway, now with ten times the data to migrate.

What we actually do

Schema-first on what you know

We lock typed schemas for the core event types you are certain about. That delivers roughly 80% of the structured performance benefit on day one. Reliable reasoning over your known domain starts immediately.

A structured catch-all for what you don't

Everything else lands in a structured catch-all bucket, never the void. The catch-all gives you signal on which new event types are actually appearing in the wild. You are never blind to the unknown.

Mine the catch-all every quarter

Promote patterns into first-class types

Each quarter you review the catch-all and promote recurring patterns into proper schema types. Your data model grows from evidence, not guesswork. It is not elegant, and it works.

Why the hybrid wins

The hybrid is not a compromise, it is the only honest position. Pure schema-first pretends you know everything on day one, which you never do. Pure unstructured pretends structure can wait, which it cannot.

So you take the deterministic speed of structure where you have certainty. You keep a labelled inbox for everything that surprises you. Certainty and discovery stop fighting each other, and your roadmap follows real data instead of opinions.

Common questions

Why not stay fully unstructured and migrate later?

Because "later" means a forced rebuild with ten times the data under live users. Day-one structure is simply the cheap version of that same work.

Won't I miss important data in month one?

Yes, and the catch-all catches it. You promote it on a schedule instead of discovering the gap in production.

Decide your core types in 20 minutes

We separate what you know for certain from what belongs in the catch-all, and set the quarterly promotion loop.

Book my technical call →