Why not UUID?
If you've worked with databases before, your first instinct for public-facing IDs might be UUID v4 — a 128-bit random value like 550e8400-e29b-41d4-a716-446655440000. It's the industry default. Gista.js uses a 12-character random hex string instead, like a7f3b9c1e2d4. Here's why.
B-tree locality
Database indexes are B-trees. When you insert a row, the database must place the new key in sorted order within the tree. UUIDv4 is uniformly random across a 128-bit space, so every insert lands in a random leaf node. At scale this means:
- Page splits everywhere. Each insert touches a different leaf, forcing the database to split pages that are nowhere near each other in memory. The tree's fanout degrades.
- Cache misses. The working set for writes is the entire index, not a hot region. Buffer pool hit rates drop because recently-touched pages are unlikely to be touched again soon.
- Write amplification. Random inserts turn sequential disk I/O into random I/O. On SSDs this matters less than spinning disks, but it still increases write amplification and compaction pressure on LSM-based engines.
A shorter random ID doesn't fully solve locality (it's still random), but it keeps the index smaller — which means more of it fits in memory and fewer pages get touched per lookup.
Space
UUIDv4 is 36 characters in its canonical form (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx). You can store it as a 16-byte binary blob, but then every query needs encoding/decoding and debugging with raw SQL becomes painful — so in practice most apps store the text. ULID is 26 characters. A 12-character hex string is 12 bytes.
This matters even more when you consider that many frameworks use UUID as the primary key — meaning every foreign key in every related table is also 36 bytes. A submissions table with a form_id and a user_id is storing two UUIDs per row just for joins. An integer foreign key is 4-8 bytes. The cost multiplies across every relationship in the schema.
That's why Gista.js keeps an integer id as the primary key and adds public_id as a separate column — joins stay cheap, and only the column that faces the outside world pays the cost of randomness.
Every byte in an indexed column costs page space. Over millions of rows, this is the difference between an index that fits in the buffer pool and one that spills to disk.
| Format | Storage | Bits of entropy | URL-friendly |
|---|---|---|---|
| UUIDv4 | 36 bytes | 122 | Ugly, with hyphens |
| ULID | 26 bytes | 80 | Crockford base32, sortable |
| Hex 12 chars | 12 bytes | 48 | Clean |
48 bits is enough
The birthday paradox says you hit a 50% collision probability at roughly 2^(n/2) rows, where n is the number of entropy bits. For 48 bits, that's ~16 million rows in the same table. For a typical app, that's plenty. If you outgrow it, you bump the length — defaultHex(16) gives you 64 bits (safe to ~4 billion), and defaultHex(32) gives 128 bits, matching UUIDv4's entropy in half the characters.
UUIDv4's 122 bits of entropy are overkill for almost every application. You're paying the storage and locality cost for collision resistance you'll never need.
UUIDv7 and ULID don't fix it
UUIDv7 and ULID (used by libraries like ulidx) both embed a timestamp prefix, which restores insert locality. But they share the same problems:
- Still 128 bits — 36 bytes as text for UUIDv7, 26 characters for ULID. Both are larger than needed.
- The timestamp prefix leaks creation order — the exact thing
public_idis designed to hide. Sort by ID and you've got a timeline. - If you strip the timestamp before exposing the value publicly, you've rebuilt a random ID with extra steps and wasted bytes.
The design
Gista.js uses nanoid with a hex-only alphabet. The result is a short, URL-safe, opaque string that:
- Fits cleanly in URLs without encoding
- Keeps indexes small and cache-friendly
- Provides tunable entropy via a single length parameter
- Reveals nothing about row count or creation order
It's not clever. It's just the right trade-off for an application database.