Goal: Define the simplest database-backed VCS that preserves core versioning concepts, supports filesystem checkouts, and can sync with other databases. Optimize for correctness and small surface area before advanced features.
Non-Goals (v1)
- No web UI, tickets, wiki, forums.
- No complex permissions or auth beyond basic user IDs.
- No advanced delta chains or merge drivers; keep minimal.
- No tag propagation rules; simple labels only.
Core Concepts
- Artifacts are immutable blobs addressed by content hash (UUID).
- A check-in (manifest) describes a snapshot of paths → file hashes and parent relationship(s).
- Checkout materializes a manifest’s files to a working directory.
- Sync exchanges missing artifacts and manifests between databases.
Minimal Schema
Tables use integer primary keys for internal joins and textual hashes for external addressing.
blob(rid INTEGER PRIMARY KEY, uuid TEXT UNIQUE NOT NULL, size INTEGER, content BLOB)- Stores immutable file contents and manifests (both are artifacts).
uuidis the SHA-256 (or BLAKE3) ofcontent.
manifest(rid INTEGER PRIMARY KEY REFERENCES blob(rid), is_merge BOOLEAN DEFAULT 0)- Identifies which
blobrows are manifests and whether they represent merges.
- Identifies which
mlink(manifest INTEGER REFERENCES manifest(rid), fn TEXT, fid INTEGER REFERENCES blob(rid), PRIMARY KEY(manifest, fn))- Per-file mapping inside a manifest: path name → file artifact rid.
- Minimal: store path as text; skip
filenameinterning initially.
plink(parent INTEGER REFERENCES manifest(rid), child INTEGER REFERENCES manifest(rid), PRIMARY KEY(parent, child))- Parent/child DAG edges. Multiple parents allowed for merges.
label(manifest INTEGER REFERENCES manifest(rid), name TEXT, PRIMARY KEY(manifest, name))(optional)- Lightweight tags (e.g., branch names). No propagation semantics in v1.
Indexes:
CREATE INDEX blob_uuid ON blob(uuid);CREATE INDEX mlink_manifest ON mlink(manifest);CREATE INDEX mlink_fn ON mlink(fn);CREATE INDEX plink_child ON plink(child);CREATE INDEX plink_parent ON plink(parent);
Hashing and IDs
- Content
uuid: SHA-256 hex (preferred) or BLAKE3 for speed. Store size for quick checks. - Manifests’
uuidequals the hash of their canonical text format. ridis auto-increment. All relationships useridfor speed.
Manifest Format (Canonical Text)
Simple line-oriented format, stable and easy to hash:
F <path> <file-uuid>— file entry.P <parent-manifest-uuid> [<parent2-uuid> ...]— parents.C <comment>— optional message.T <label>— optional label/tag (stored inlabel).
Canonicalization:
- Paths sorted lexicographically.
- UTF-8, LF line endings.
- No trailing whitespace.
Minimal Operations
1. Check-in
Input: working directory, parent manifest uuid (optional), comment, labels.
- Hash all files → insert into
blobif new. - Build manifest text (
F,P,C,T); compute itsuuid. - Insert manifest into
bloband intomanifest. - For each
Fline → insert intomlink. - Insert
plinkedge(s) from parents to this manifest. - Insert labels into
label.
Golang API sketch:
func WriteBlob(content []byte) (rid int, uuid string)func WriteManifest(m Manifest) (rid int, uuid string)func AddMlink(manifestRid int, entries []FileEntry)func LinkParents(childRid int, parentUuids []string)
2. Checkout
Input: manifest uuid, target directory.
- Resolve manifest
ridbyuuid(fromblob). - Query
mlinkfor(fn, fid)pairs. - For each file → read
blob.contentand write to filesystem pathfn.
Golang API sketch:
func Checkout(manifestUuid string, dir string) error
3. Diff (Minimal)
- Compare two manifests by joining
mlinkonfn. - Report added/removed/changed paths based on file
uuiddifferences.
4. Merge (Minimal, Optional v1.1)
- Choose base via simple LCA heuristic using
plinkancestry (or first common ancestor). - For each path, do 3-way line merge; if conflict, mark and require manual resolution.
- Produce a working directory; follow with Check-in.
Note: Can defer to an external merge tool via command invocation. Store merged results as new blobs.
5. Delta Storage (Optional v1.1)
- Start without deltas; store full content in
blobfor simplicity. - Add
delta(rid INTEGER PRIMARY KEY, srcid INTEGER REFERENCES blob(rid))later. - Implement on-read reconstruction or on-write rebase when enabling.
Rationale: Simplifies correctness; premature delta chains complicate sync and integrity.
6. Sync
Goal: Exchange missing artifacts and manifests with another database (peer).
Protocol (minimal):
- Client sends known tip manifest
uuids. - Server responds with frontier manifests not present client-side.
- Transfer manifests first, then referenced file blobs.
- Each artifact as
{uuid, size, content}in a simple frame (e.g., HTTP or gRPC).
Database process:
- For each incoming artifact: upsert into
blobbyuuid. - If manifest: insert
manifest, parse and upsertmlink, andplinkedges. - Idempotent operations; integrity checked via
uuid.
Golang API sketch:
func MissingUuids(local []string, remote []string) (need []string)func ApplyArtifact(a Artifact) error
7. Integrity and Validation
- On ingest, verify
uuid == hash(content). - For manifests, parse and validate that referenced file
uuids exist or mark as phantom until fetched. - Optional: maintain
leaf(manifestRid)table to quickly find tips.
Filesystem Model
- Working directory is outside the DB; DB is the source of truth.
- No implicit staging area in v1: we read filesystem, compute hashes, and compare to target manifest.
- Optional: add a tiny staging table for future enhancements.
Transactions and Concurrency
- Wrap check-in, manifest write, mlink/plink inserts in a single transaction.
- SQLite recommended for simplicity (WAL mode); Postgres/MySQL adaptable.
Minimal Index Optimizations
blob(uuid)for fast existence checks on ingest and content lookup.mlink(manifest)for checkout list retrieval.plink(child)to traverse ancestry quickly.
Suggested Golang Package Layout
pkg/hash— hashing utilities.pkg/store— DB accessors (blob, manifest, mlink, plink, label).pkg/manifest— serialize/parse manifest text; canonicalization.pkg/vcs— operations: checkin, checkout, diff, merge (optional), sync.cmd/cambria— CLI wiring (status, checkin, checkout, log, sync).
CLI (Minimal)
cambria initcambria checkin -m "msg" -p <parent>cambria checkout <manifest-uuid> <dir>cambria diff <uuid1> <uuid2>cambria sync <remote-url>
Migration Path (v1 → v2)
- Add
deltatable and reconstruction logic. - Add
filenameinterning for space and index efficiency. - Enhance merge base calculation and conflict markers.
- Add branch/tag propagation, immutable labels, and sign-offs.
- Add auth, permissions, and signed manifests.
Summary
Start with only: blob, manifest, mlink, plink (+ simple label).
Implement check-in, checkout, diff, and sync. Defer delta and complex merge to a subsequent version. Focus on integrity (uuid hashing), transactional writes, and straightforward sync frames to replicate artifacts reliably.