Last Updated: 2025-12-19
This document provides a comprehensive guide to the implementation of the Cambria version control system in Go. It is intended for developers working on the Cambria project and for AI agents tasked with its maintenance and extension.
1. Design Philosophy & Key Principles
Cambria is a re-implementation of the core concepts of Fossil SCM in pure Go. It adheres to a set of guiding principles that emphasize simplicity, correctness, and robustness.
- Minimal and Correct: Prioritize the simplest correct implementation.
- SQLite-Backed: A single
*.dbfile serves as the atomic source of truth for the entire repository. This includes all versioned files, historical metadata, and project configuration. - Content-Addressable Storage: All artifacts (files, manifests) are identified by a unique SHA-256 hash of their content. This ensures data integrity and provides a canonical identifier (UUID) for every piece of data.
- Immutable Artifacts: Once an artifact (a "blob") is written to the repository, it is never changed. New versions are created as new blobs.
- Transactional Integrity: All database operations that modify state are
executed within a transaction. The
store.DBTXinterface abstracts*sql.DBand*sql.Txto enforce this at the data access layer. - Repository Pattern: High-level VCS operations are exposed via the
vcs.Repositorystruct, which encapsulates the database connection and provides a clean API for all version control logic. - Go Idioms: The project favors the Go standard library and minimizes external dependencies.
- Standard Testing: All functionality is tested using the standard
testingpackage.
2. Core Architecture
vcs.Repository
The vcs.Repository struct is the primary entry point for all version control
logic. It encapsulates the database connection and provides methods for all
high-level operations.
// pkg/vcs/repo.go
type Repository struct {
db *store.DB
}
// Usage:
repo, err := vcs.InitRepository("path/to/my-repo.db")
// or
repo, err := vcs.OpenRepository("path/to/my-repo.db")
// Perform operations:
uuid, err := repo.Checkin(...)
err = repo.Checkout(...)
store.DBTX Interface
To guarantee transactional integrity, all functions in the pkg/store package
that modify the database accept a DBTX interface. This allows the same
function to be used either in a single read operation (with a *store.DB) or as
part of a larger atomic write operation (with a *sql.Tx).
// pkg/store/dbtx.go
type DBTX interface {
Exec(query string, args ...interface{}) (sql.Result, error)
Query(query string, args ...interface{}) (*sql.Rows, error)
QueryRow(query string, args ...interface{}) *sql.Row
}
// Transactional Write Pattern:
tx, err := repo.DB().Begin()
if err != nil { /* ... */ }
defer tx.Rollback() // Ensures rollback on error
// ... call store functions with the transaction object ...
err = store.CreateManifest(tx, rid, false)
// ...
return tx.Commit() // Commits all changes atomically
3. Package Structure
cambria/
├── pkg/
│ ├── hash/ # Content hashing (SHA-256)
│ ├── store/ # SQLite data access layer (uses DBTX)
│ │ ├── db.go
│ │ ├── dbtx.go # The transactional interface
│ │ ├── schema.go # Includes vfile and vmerge tables
│ │ ├── blob.go
│ │ ├── manifest.go
│ │ ├── mlink.go
│ │ ├── plink.go
│ │ └── label.go
│ ├── artifact/ # Manifest parsing/generation
│ │ └── manifest.go
│ └── vcs/ # High-level version control operations
│ ├── repo.go # Repository struct and lifecycle
│ ├── checkin.go # Commit operation (uses vfile)
│ ├── checkout.go # Checkout operation (populates vfile)
│ ├── add.go # File addition (updates vfile)
│ ├── diff.go # Diff computation between versions
│ ├── log.go # Timeline/history operations
│ ├── label.go # Branch and tag management
│ ├── workdir.go # Working directory operations (Remove, Rename)
│ ├── merge.go # Three-way merge implementation
│ ├── vfile.go # VFILE system implementation
│ └── ... (test files)
├── internal/
│ └── testutil/ # Test helpers
├── cmd/
│ └── cambria/ # CLI application
│ ├── main.go
│ ├── common.go # Shared utilities (ResolveVersion, etc.)
│ ├── init.go
│ ├── open.go
│ ├── close.go
│ ├── add.go
│ ├── rm.go
│ ├── mv.go
│ ├── commit.go
│ ├── checkout.go
│ ├── status.go
│ ├── diff.go
│ ├── log.go
│ ├── branch.go
│ ├── tag.go
│ └── merge.go
└── doc_cambria/
└── CAMBRIA_VCS_IMPL.md # This file
4. Core Data Model (Schema)
The database schema is designed to be idempotent, using
CREATE TABLE IF NOT EXISTS to allow for safe re-initialization.
-- Immutable artifact storage
CREATE TABLE IF NOT EXISTS blob(
rid INTEGER PRIMARY KEY AUTOINCREMENT,
uuid TEXT UNIQUE NOT NULL,
size INTEGER NOT NULL,
content BLOB NOT NULL
);
-- Manifest identification (a manifest is a special type of blob)
CREATE TABLE IF NOT EXISTS manifest(
rid INTEGER PRIMARY KEY REFERENCES blob(rid),
is_merge BOOLEAN DEFAULT 0
);
-- Manifest-file linkage (which files are in which manifest)
CREATE TABLE IF NOT EXISTS mlink(
manifest INTEGER NOT NULL REFERENCES manifest(rid),
fn TEXT NOT NULL,
fid INTEGER NOT NULL REFERENCES blob(rid),
PRIMARY KEY(manifest, fn)
);
-- Parent-child DAG (the commit graph)
CREATE TABLE IF NOT EXISTS plink(
parent INTEGER NOT NULL REFERENCES manifest(rid),
child INTEGER NOT NULL REFERENCES manifest(rid),
PRIMARY KEY(parent, child)
);
-- Labels (branches and tags)
CREATE TABLE IF NOT EXISTS label(
manifest INTEGER NOT NULL REFERENCES manifest(rid),
name TEXT NOT NULL,
PRIMARY KEY(manifest, name)
);
-- Working directory file state tracking (VFILE)
CREATE TABLE IF NOT EXISTS vfile(
id INTEGER PRIMARY KEY AUTOINCREMENT,
vid INTEGER NOT NULL REFERENCES manifest(rid), -- Baseline manifest
rid INTEGER REFERENCES blob(rid), -- Baseline file content
mrid INTEGER REFERENCES blob(rid), -- Merged file content
pathname TEXT NOT NULL COLLATE NOCASE,
origname TEXT COLLATE NOCASE, -- Original name if renamed
is_exe BOOLEAN NOT NULL DEFAULT 0,
is_link BOOLEAN NOT NULL DEFAULT 0,
chnged INTEGER NOT NULL DEFAULT 0, -- Change status
deleted BOOLEAN NOT NULL DEFAULT 0,
mhash TEXT,
mtime INTEGER,
size INTEGER,
UNIQUE(vid, pathname)
);
-- Merge state tracking (VMERGE)
CREATE TABLE IF NOT EXISTS vmerge(
id INTEGER PRIMARY KEY AUTOINCREMENT,
merge INTEGER NOT NULL REFERENCES manifest(rid), -- Merge-in manifest
mhash TEXT NOT NULL,
merge_type INTEGER NOT NULL DEFAULT 0,
is_baseline BOOLEAN NOT NULL DEFAULT 0,
UNIQUE(merge)
);
-- Temporary snapshot for merge abort
CREATE TABLE IF NOT EXISTS vfile_snapshot(...);
5. Implemented Features
Core Version Control
InitRepository/OpenRepository: Create and open repositories.Checkin/Commit: Create new commits (manifests) from the working directory. This is a transactional operation that writes file blobs, generates a manifest, and updates all linkage tables (mlink,plink,label).Checkout: Populate a directory with the files from a specific manifest. Includes path traversal protection.Add: Add files to version control, updating thevfiletable.Diff: Compute differences between files or manifests.
VFILE System for Working Directory Management
Inspired by Fossil's vfile.c, Cambria uses a set of SQLite tables (vfile,
vmerge) to efficiently track the state of the working directory. This avoids
expensive full-directory scans for operations like status or commit.
vfileTable: Tracks every file in the working directory, its baseline version, its current status (clean, edited, added, deleted, renamed), and metadata.vmergeTable: Tracks the state of an in-progress merge.- Core Functions:
LoadVFileFromManifest,CheckVFileSignatures,WriteVFileToDisk. - VCS Integration:
checkoutpopulatesvfile,addinserts intovfile,commitreads changes fromvfile, andstatusqueriesvfile.
Command-Line Interface (CLI)
A full-featured CLI is implemented using the urfave/cli/v3 framework.
- Core:
init,open,close,add,commit,checkout,status,diff. - Advanced:
log,branch,tag,rm,mv,merge. - Name Resolution: A sophisticated
ResolveVersionfunction understands symbolic names (tip,current), prefixed names (branch:NAME,tag:NAME), UUID prefixes, and bare names (branch > tag).
Advanced Features
- Timeline and Log (
log): View commit history with options for filtering by branch, limiting output, and reversing order. - Branch and Tag Management (
branch,tag): Create, list, and delete branches and tags. Internally, these are stored as prefixed labels (branch:main,tag:v1.0) in thelabeltable. - File Operations (
rm,mv): Remove and rename files. Renames are tracked using theorignamefield in thevfiletable, enabling robust history and future partial commit support. - Three-Way Merge (
merge): A complete three-way merge implementation based on Fossil's algorithm.- Merge Base Finding: Uses a breadth-first search to find the common ancestor.
- File-Level Merge: Determines which files to add, delete, update, or merge based on a unified view of three versions (baseline, current, merge-in).
- Content-Level Merge: Performs a line-by-line three-way merge for text files, inserting conflict markers where necessary.
- Conflict Detection: Handles content conflicts, rename conflicts, and delete/modify conflicts.
- Binary Files: For binary conflicts, it preserves both versions of the file for manual resolution.
- Merge Abort: A
merge --abortcommand allows safely reverting a merge in progress by restoring a snapshot of thevfiletable.
7. Fossil Module to Cambria Package Mapping
| Fossil Module | Cambria Package | Responsibility |
|---|---|---|
src/content.c |
pkg/store/blob.go |
Content storage |
src/manifest.c |
pkg/artifact/manifest.go |
Manifest parsing and generation |
src/db.c |
pkg/store/ |
Database operations (via DBTX) |
src/checkin.c |
pkg/vcs/checkin.go |
High-level commit creation |
src/checkout.c |
pkg/vcs/checkout.go |
High-level checkout to filesystem |
src/add.c |
pkg/vcs/add.go |
File addition to version control |
src/diff.c |
pkg/vcs/diff.go |
Diff computation |
src/vfile.c |
pkg/vcs/vfile.go |
VFILE system for working directory tracking |
src/rm.c, mv.c |
pkg/vcs/workdir.go |
File removal and rename operations |
src/timeline.c |
pkg/vcs/log.go |
History and log generation |
src/branch.c |
pkg/vcs/label.go |
Branch and tag management |
src/merge.c |
pkg/vcs/merge.go |
Three-way merge logic |
8. Critical Reminders for AI Agents
- Use the Repository API: Do not call
pkg/storefunctions directly for VCS operations. Use the methods on thevcs.Repositorystruct. - Embrace Transactions: When adding new multi-step database logic, use the
tx, err := repo.DB().Begin()pattern. - CGo is Required: The SQLite driver requires CGo. Ensure
CGO_ENABLED=1. - Test Everything: All new functionality must be accompanied by tests in
the same package. Use the
setupTestRepohelper inpkg/vcsfor integration-style tests. - Path Security: Always validate user-provided file paths to prevent directory traversal attacks.
- VFILE
orignameField: When implementing file operations, use theorignamefield in thevfiletable to track original filenames for renamed files. - Label Prefixes: Branch and tag names are stored internally with prefixes
(
"branch:"and"tag:"). Always use the VCS API functions (CreateBranch,CreateTag) to manage labels, not raw SQL. - Map Key Selection: When building maps from database queries, be cautious.
Using non-unique values (like a content RID, which can be shared by multiple
files) as map keys will lead to silently overwritten entries. Prefer
map[filename]ridovermap[rid]filename. - Test with VCS APIs: When writing tests, always use the proper VCS API
functions (e.g.,
repo.CreateBranch()) rather than raw SQL inserts. The VCS layer applies important transformations (like label prefixes) that raw SQL bypasses.
9. Development Commands
# Build all packages
go build ./...
# Run all tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Run tests with race detector (CRITICAL before submitting)
go test -race ./...
# Format code
go fmt ./...
# Static analysis
go vet ./...