cambria: Cambria VCS Implementation Guide

Last Updated: 2025-12-19

This document provides a comprehensive guide to the implementation of the Cambria version control system in Go. It is intended for developers working on the Cambria project and for AI agents tasked with its maintenance and extension.

1. Design Philosophy & Key Principles

Cambria is a re-implementation of the core concepts of Fossil SCM in pure Go. It adheres to a set of guiding principles that emphasize simplicity, correctness, and robustness.

Minimal and Correct: Prioritize the simplest correct implementation.
SQLite-Backed: A single *.db file serves as the atomic source of truth for the entire repository. This includes all versioned files, historical metadata, and project configuration.
Content-Addressable Storage: All artifacts (files, manifests) are identified by a unique SHA-256 hash of their content. This ensures data integrity and provides a canonical identifier (UUID) for every piece of data.
Immutable Artifacts: Once an artifact (a "blob") is written to the repository, it is never changed. New versions are created as new blobs.
Transactional Integrity: All database operations that modify state are executed within a transaction. The store.DBTX interface abstracts *sql.DB and *sql.Tx to enforce this at the data access layer.
Repository Pattern: High-level VCS operations are exposed via the vcs.Repository struct, which encapsulates the database connection and provides a clean API for all version control logic.
Go Idioms: The project favors the Go standard library and minimizes external dependencies.
Standard Testing: All functionality is tested using the standard testing package.

2. Core Architecture

`vcs.Repository`

The vcs.Repository struct is the primary entry point for all version control logic. It encapsulates the database connection and provides methods for all high-level operations.

// pkg/vcs/repo.go
type Repository struct {
    db *store.DB
}

// Usage:
repo, err := vcs.InitRepository("path/to/my-repo.db")
// or
repo, err := vcs.OpenRepository("path/to/my-repo.db")

// Perform operations:
uuid, err := repo.Checkin(...)
err = repo.Checkout(...)

`store.DBTX` Interface

To guarantee transactional integrity, all functions in the pkg/store package that modify the database accept a DBTX interface. This allows the same function to be used either in a single read operation (with a *store.DB) or as part of a larger atomic write operation (with a *sql.Tx).

// pkg/store/dbtx.go
type DBTX interface {
    Exec(query string, args ...interface{}) (sql.Result, error)
    Query(query string, args ...interface{}) (*sql.Rows, error)
    QueryRow(query string, args ...interface{}) *sql.Row
}

// Transactional Write Pattern:
tx, err := repo.DB().Begin()
if err != nil { /* ... */ }
defer tx.Rollback() // Ensures rollback on error

// ... call store functions with the transaction object ...
err = store.CreateManifest(tx, rid, false)
// ...

return tx.Commit() // Commits all changes atomically

3. Package Structure

cambria/
├── pkg/
│   ├── hash/           # Content hashing (SHA-256)
│   ├── store/          # SQLite data access layer (uses DBTX)
│   │   ├── db.go
│   │   ├── dbtx.go     # The transactional interface
│   │   ├── schema.go   # Includes vfile and vmerge tables
│   │   ├── blob.go
│   │   ├── manifest.go
│   │   ├── mlink.go
│   │   ├── plink.go
│   │   └── label.go
│   ├── artifact/       # Manifest parsing/generation
│   │   └── manifest.go
│   └── vcs/            # High-level version control operations
│       ├── repo.go                     # Repository struct and lifecycle
│       ├── checkin.go                  # Commit operation (uses vfile)
│       ├── checkout.go                 # Checkout operation (populates vfile)
│       ├── add.go                      # File addition (updates vfile)
│       ├── diff.go                     # Diff computation between versions
│       ├── log.go                      # Timeline/history operations
│       ├── label.go                    # Branch and tag management
│       ├── workdir.go                  # Working directory operations (Remove, Rename)
│       ├── merge.go                    # Three-way merge implementation
│       ├── vfile.go                    # VFILE system implementation
│       └── ... (test files)
├── internal/
│   └── testutil/       # Test helpers
├── cmd/
│   └── cambria/        # CLI application
│       ├── main.go
│       ├── common.go   # Shared utilities (ResolveVersion, etc.)
│       ├── init.go
│       ├── open.go
│       ├── close.go
│       ├── add.go
│       ├── rm.go
│       ├── mv.go
│       ├── commit.go
│       ├── checkout.go
│       ├── status.go
│       ├── diff.go
│       ├── log.go
│       ├── branch.go
│       ├── tag.go
│       └── merge.go
└── doc_cambria/
    └── CAMBRIA_VCS_IMPL.md # This file

4. Core Data Model (Schema)

The database schema is designed to be idempotent, using CREATE TABLE IF NOT EXISTS to allow for safe re-initialization.

-- Immutable artifact storage
CREATE TABLE IF NOT EXISTS blob(
    rid INTEGER PRIMARY KEY AUTOINCREMENT,
    uuid TEXT UNIQUE NOT NULL,
    size INTEGER NOT NULL,
    content BLOB NOT NULL
);

-- Manifest identification (a manifest is a special type of blob)
CREATE TABLE IF NOT EXISTS manifest(
    rid INTEGER PRIMARY KEY REFERENCES blob(rid),
    is_merge BOOLEAN DEFAULT 0
);

-- Manifest-file linkage (which files are in which manifest)
CREATE TABLE IF NOT EXISTS mlink(
    manifest INTEGER NOT NULL REFERENCES manifest(rid),
    fn TEXT NOT NULL,
    fid INTEGER NOT NULL REFERENCES blob(rid),
    PRIMARY KEY(manifest, fn)
);

-- Parent-child DAG (the commit graph)
CREATE TABLE IF NOT EXISTS plink(
    parent INTEGER NOT NULL REFERENCES manifest(rid),
    child INTEGER NOT NULL REFERENCES manifest(rid),
    PRIMARY KEY(parent, child)
);

-- Labels (branches and tags)
CREATE TABLE IF NOT EXISTS label(
    manifest INTEGER NOT NULL REFERENCES manifest(rid),
    name TEXT NOT NULL,
    PRIMARY KEY(manifest, name)
);

-- Working directory file state tracking (VFILE)
CREATE TABLE IF NOT EXISTS vfile(
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    vid INTEGER NOT NULL REFERENCES manifest(rid), -- Baseline manifest
    rid INTEGER REFERENCES blob(rid),             -- Baseline file content
    mrid INTEGER REFERENCES blob(rid),            -- Merged file content
    pathname TEXT NOT NULL COLLATE NOCASE,
    origname TEXT COLLATE NOCASE,                 -- Original name if renamed
    is_exe BOOLEAN NOT NULL DEFAULT 0,
    is_link BOOLEAN NOT NULL DEFAULT 0,
    chnged INTEGER NOT NULL DEFAULT 0,            -- Change status
    deleted BOOLEAN NOT NULL DEFAULT 0,
    mhash TEXT,
    mtime INTEGER,
    size INTEGER,
    UNIQUE(vid, pathname)
);

-- Merge state tracking (VMERGE)
CREATE TABLE IF NOT EXISTS vmerge(
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    merge INTEGER NOT NULL REFERENCES manifest(rid), -- Merge-in manifest
    mhash TEXT NOT NULL,
    merge_type INTEGER NOT NULL DEFAULT 0,
    is_baseline BOOLEAN NOT NULL DEFAULT 0,
    UNIQUE(merge)
);

-- Temporary snapshot for merge abort
CREATE TABLE IF NOT EXISTS vfile_snapshot(...);

5. Implemented Features

Core Version Control

InitRepository / OpenRepository: Create and open repositories.
Checkin / Commit: Create new commits (manifests) from the working directory. This is a transactional operation that writes file blobs, generates a manifest, and updates all linkage tables (mlink, plink, label).
Checkout: Populate a directory with the files from a specific manifest. Includes path traversal protection.
Add: Add files to version control, updating the vfile table.
Diff: Compute differences between files or manifests.

VFILE System for Working Directory Management

Inspired by Fossil's vfile.c, Cambria uses a set of SQLite tables (vfile, vmerge) to efficiently track the state of the working directory. This avoids expensive full-directory scans for operations like status or commit.

vfile Table: Tracks every file in the working directory, its baseline version, its current status (clean, edited, added, deleted, renamed), and metadata.
vmerge Table: Tracks the state of an in-progress merge.
Core Functions: LoadVFileFromManifest, CheckVFileSignatures, WriteVFileToDisk.
VCS Integration: checkout populates vfile, add inserts into vfile, commit reads changes from vfile, and status queries vfile.

Command-Line Interface (CLI)

A full-featured CLI is implemented using the urfave/cli/v3 framework.

Core: init, open, close, add, commit, checkout, status, diff.
Advanced: log, branch, tag, rm, mv, merge.
Name Resolution: A sophisticated ResolveVersion function understands symbolic names (tip, current), prefixed names (branch:NAME, tag:NAME), UUID prefixes, and bare names (branch > tag).

Advanced Features

Timeline and Log (log): View commit history with options for filtering by branch, limiting output, and reversing order.
Branch and Tag Management (branch, tag): Create, list, and delete branches and tags. Internally, these are stored as prefixed labels (branch:main, tag:v1.0) in the label table.
File Operations (rm, mv): Remove and rename files. Renames are tracked using the origname field in the vfile table, enabling robust history and future partial commit support.
Three-Way Merge (merge): A complete three-way merge implementation based on Fossil's algorithm.
- Merge Base Finding: Uses a breadth-first search to find the common ancestor.
- File-Level Merge: Determines which files to add, delete, update, or merge based on a unified view of three versions (baseline, current, merge-in).
- Content-Level Merge: Performs a line-by-line three-way merge for text files, inserting conflict markers where necessary.
- Conflict Detection: Handles content conflicts, rename conflicts, and delete/modify conflicts.
- Binary Files: For binary conflicts, it preserves both versions of the file for manual resolution.
- Merge Abort: A merge --abort command allows safely reverting a merge in progress by restoring a snapshot of the vfile table.

7. Fossil Module to Cambria Package Mapping

Fossil Module	Cambria Package	Responsibility
`src/content.c`	`pkg/store/blob.go`	Content storage
`src/manifest.c`	`pkg/artifact/manifest.go`	Manifest parsing and generation
`src/db.c`	`pkg/store/`	Database operations (via `DBTX`)
`src/checkin.c`	`pkg/vcs/checkin.go`	High-level commit creation
`src/checkout.c`	`pkg/vcs/checkout.go`	High-level checkout to filesystem
`src/add.c`	`pkg/vcs/add.go`	File addition to version control
`src/diff.c`	`pkg/vcs/diff.go`	Diff computation
`src/vfile.c`	`pkg/vcs/vfile.go`	VFILE system for working directory tracking
`src/rm.c`, `mv.c`	`pkg/vcs/workdir.go`	File removal and rename operations
`src/timeline.c`	`pkg/vcs/log.go`	History and log generation
`src/branch.c`	`pkg/vcs/label.go`	Branch and tag management
`src/merge.c`	`pkg/vcs/merge.go`	Three-way merge logic

8. Critical Reminders for AI Agents

Use the Repository API: Do not call pkg/store functions directly for VCS operations. Use the methods on the vcs.Repository struct.
Embrace Transactions: When adding new multi-step database logic, use the tx, err := repo.DB().Begin() pattern.
CGo is Required: The SQLite driver requires CGo. Ensure CGO_ENABLED=1.
Test Everything: All new functionality must be accompanied by tests in the same package. Use the setupTestRepo helper in pkg/vcs for integration-style tests.
Path Security: Always validate user-provided file paths to prevent directory traversal attacks.
VFILE origname Field: When implementing file operations, use the origname field in the vfile table to track original filenames for renamed files.
Label Prefixes: Branch and tag names are stored internally with prefixes ("branch:" and "tag:"). Always use the VCS API functions (CreateBranch, CreateTag) to manage labels, not raw SQL.
Map Key Selection: When building maps from database queries, be cautious. Using non-unique values (like a content RID, which can be shared by multiple files) as map keys will lead to silently overwritten entries. Prefer map[filename]rid over map[rid]filename.
Test with VCS APIs: When writing tests, always use the proper VCS API functions (e.g., repo.CreateBranch()) rather than raw SQL inserts. The VCS layer applies important transformations (like label prefixes) that raw SQL bypasses.

9. Development Commands

# Build all packages
go build ./...

# Run all tests
go test ./...

# Run tests with coverage
go test -cover ./...

# Run tests with race detector (CRITICAL before submitting)
go test -race ./...

# Format code
go fmt ./...

# Static analysis
go vet ./...