Last Updated: 2025-12-14
This document provides structured guidance for AI agents implementing and extending the Cambria version control system in Go. It reflects the project's current state, including the completed VFILE system.
Project Status
- Phase 0 (Foundation): ✅ Completed.
- Post-Phase 0 Refactoring: ✅ Completed.
- Phase 1 (Version Control Operations): ✅ Completed.
- Phase 1.5 (VFILE Working Directory Tracking): ✅ Completed.
- Phase 2 (CLI & User Interface): ✅ Completed.
- Current Phase: Phase 3 (Advanced Features) - Ready to begin.
- Next Task: Implement advanced version control features (branches, merge, timeline, etc.)
The foundational library packages (hash, store, artifact) are complete and tested. A major refactoring introduced a transactional database layer and a high-level Repository API. All core VCS operations including Checkin, Checkout, Add, Commit, Diff, and working directory management are now implemented and tested. The VFILE system provides efficient working directory state tracking using SQLite tables (vfile and vmerge). A fully functional CLI with commands for init, add, commit, status, checkout, and diff has been implemented using the urfave/cli v3 framework.
Key Design Principles & Patterns
- Minimal and Correct: Simplest correct implementation first.
- SQLite-Backed: A single file (
*.db) is the source of truth. - Content-Addressable: All artifacts are identified by their SHA-256 hash (UUID).
- Immutable Artifacts: Once written, a blob never changes.
- Transactional Store Layer: All database operations that modify state must occur within a transaction. The
store.DBTXinterface abstracts*sql.DBand*sql.Txto enforce this. - Repository Pattern: All high-level VCS operations are exposed via the
vcs.Repositorystruct, which encapsulates the database connection. - Go Idioms: Standard library first, minimal external dependencies.
- Standard Testing: Use only the
testingpackage.
Core Architecture
vcs.Repository
This is the primary entry point for all version control logic. Always use this struct to interact with a repository.
// pkg/vcs/repo.go
type Repository struct {
db *store.DB
}
// How to use:
repo, err := vcs.InitRepository("path/to/my-repo.db")
// or
repo, err := vcs.OpenRepository("path/to/my-repo.db")
// Perform operations:
uuid, err := repo.Checkin(...)
err = repo.Checkout(...)
store.DBTX Interface
To ensure transactional integrity, all functions in the pkg/store package accept a DBTX interface. This can be either a *store.DB (for single read operations) or a *sql.Tx (for multi-step write operations).
// pkg/store/dbtx.go
type DBTX interface {
Exec(query string, args ...interface{}) (sql.Result, error)
Query(query string, args ...interface{}) (*sql.Rows, error)
QueryRow(query string, args ...interface{}) *sql.Row
}
// Transactional write pattern:
tx, err := repo.DB().Begin()
if err != nil { /* ... */ }
defer tx.Rollback() // Safety net
// ... call store functions with tx ...
err = store.CreateManifest(tx, rid, false)
// ...
return tx.Commit()
Package Structure (Current)
cambria/
├── pkg/
│ ├── hash/ # Content hashing (SHA-256)
│ ├── store/ # SQLite data access layer (uses DBTX)
│ │ ├── db.go
│ │ ├── dbtx.go # The transactional interface
│ │ ├── schema.go # Includes vfile and vmerge tables
│ │ ├── blob.go
│ │ ├── manifest.go
│ │ ├── mlink.go
│ │ ├── plink.go
│ │ └── label.go
│ ├── artifact/ # Manifest parsing/generation
│ │ └── manifest.go
│ └── vcs/ # High-level version control operations
│ ├── repo.go # Repository struct and lifecycle
│ ├── checkin.go # Commit operation (uses vfile)
│ ├── checkout.go # Checkout operation (populates vfile)
│ ├── add.go # File addition (updates vfile)
│ ├── diff.go # Diff computation between versions
│ ├── workdir.go # Working directory state tracking (uses vfile)
│ ├── vfile.go # VFILE system implementation
│ ├── vfile_test.go # VFILE unit tests
│ ├── vfile_integration_test.go # VFILE integration tests
│ ├── checkout_test.go
│ └── main_test.go # VCS integration tests
├── internal/
│ └── testutil/ # Test helpers
├── cmd/
│ └── cambria/ # CLI application (skeleton exists)
│ ├── main.go
│ ├── init.go
│ ├── add.go
│ ├── commit.go
│ └── checkout.go
└── doc_cambria/ # Design documentation
├── CAMBRIA_CLI_PLAN.md # Phase 2 implementation plan (NEXT)
├── CAMBRIA_VFILE_IMPL.md # VFILE implementation details
└── ...
Core Data Model (Idempotent Schema)
The schema uses CREATE TABLE IF NOT EXISTS and CREATE INDEX IF NOT EXISTS to be safely re-initializable.
-- Immutable artifact storage
CREATE TABLE IF NOT EXISTS blob(
rid INTEGER PRIMARY KEY AUTOINCREMENT,
uuid TEXT UNIQUE NOT NULL,
size INTEGER NOT NULL,
content BLOB NOT NULL
);
-- Manifest identification
CREATE TABLE IF NOT EXISTS manifest(
rid INTEGER PRIMARY KEY REFERENCES blob(rid),
is_merge BOOLEAN DEFAULT 0
);
-- Manifest-file linkage
CREATE TABLE IF NOT EXISTS mlink(
manifest INTEGER NOT NULL REFERENCES manifest(rid),
fn TEXT NOT NULL,
fid INTEGER NOT NULL REFERENCES blob(rid),
PRIMARY KEY(manifest, fn)
);
-- Parent-child DAG
CREATE TABLE IF NOT EXISTS plink(
parent INTEGER NOT NULL REFERENCES manifest(rid),
child INTEGER NOT NULL REFERENCES manifest(rid),
PRIMARY KEY(parent, child)
);
-- Labels (branches/tags)
CREATE TABLE IF NOT EXISTS label(
manifest INTEGER NOT NULL REFERENCES manifest(rid),
name TEXT NOT NULL,
PRIMARY KEY(manifest, name)
);
-- Configuration
CREATE TABLE IF NOT EXISTS config(
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
-- Working directory file state tracking (VFILE)
CREATE TABLE IF NOT EXISTS vfile(
id INTEGER PRIMARY KEY AUTOINCREMENT,
vid INTEGER NOT NULL REFERENCES manifest(rid),
rid INTEGER REFERENCES blob(rid),
mrid INTEGER REFERENCES blob(rid),
pathname TEXT NOT NULL COLLATE NOCASE,
origname TEXT COLLATE NOCASE,
is_exe BOOLEAN NOT NULL DEFAULT 0,
is_link BOOLEAN NOT NULL DEFAULT 0,
chnged INTEGER NOT NULL DEFAULT 0,
deleted BOOLEAN NOT NULL DEFAULT 0,
mhash TEXT,
mtime INTEGER,
size INTEGER,
UNIQUE(vid, pathname)
);
-- Merge state tracking (VMERGE)
CREATE TABLE IF NOT EXISTS vmerge(
id INTEGER PRIMARY KEY AUTOINCREMENT,
merge INTEGER NOT NULL REFERENCES manifest(rid),
mhash TEXT NOT NULL,
merge_type INTEGER NOT NULL DEFAULT 0,
is_baseline BOOLEAN NOT NULL DEFAULT 0,
UNIQUE(merge)
);
Phase 1: Version Control Operations ✅ Completed
Implemented Operations
vcs.InitRepository(path): Creates a new repository file with an initialized schema.vcs.OpenRepository(path): Opens an existing repository file.repo.Checkin(files, parents, opts): Creates a new manifest (commit). This is a transactional operation that:- Generates and writes the manifest blob.
- Creates the
manifestrecord. - Creates
mlinkrecords linking the manifest to file blobs. - Creates
plinkrecords linking the new manifest to its parents. - Creates
labelrecords for any tags/branches.
repo.Checkout(root, manifestUUID, opts): Populates a directory with the files from a specific manifest. Includes path traversal protection to prevent security vulnerabilities.repo.Commit(workDir, opts): Creates a new manifest from the current state of the working directory.repo.Add(workDir, paths...)andrepo.AddWithOptions(workDir, opts, paths...): Adds files to version control (placeholder implementation, writes blobs but state tracking incomplete).vcs.Diff(original, modified, opts): Computes differences between two file versions using thegithub.com/sergi/go-difflibrary.repo.DiffFiles(uuid1, uuid2, opts): Computes diff between two files in the repository by their UUIDs.repo.DiffManifests(manifestUUID1, manifestUUID2, opts): Computes diff between two manifests, showing added/modified/deleted files.repo.DiffWorkDir(workDir, manifestUUID, opts): Computes diff between working directory and a manifest.- Working Directory Management (
pkg/vcs/workdir.go):NewWorkDir(root, repo): Creates a working directory tracker.workDir.Scan(): Scans the working directory and returns file states (untracked, modified, deleted, clean, added).workDir.HashFile(path): Computes the hash of a file in the working directory.workDir.SetBaseline(manifestUUID): Sets the current checked-out manifest.
Security Enhancements
- Path Traversal Protection: Both
CheckoutandAddoperations validate file paths to prevent directory traversal attacks (rejects paths with "..", absolute paths, etc.).
Code Quality Improvements
- Replaced all deprecated
ioutil.*functions with modernos.*equivalents. - Updated test assertions to properly test implemented functionality.
- All tests passing (13/13 in
pkg/vcs, all packages passing).
Phase 1.5: VFILE Working Directory Tracking ✅ Completed
The VFILE system provides efficient working directory state tracking using SQLite tables. This mirrors Fossil's vfile.c implementation.
Implemented Features
vfileTable: Tracks the state of every file in the working directory- Baseline manifest (vid) reference
- File blob references (rid for baseline, mrid for merges)
- Change status tracking (0-9 states: clean, edited, merged, etc.)
- File metadata (permissions, symlinks, mtime, size)
vmergeTable: Tracks merge operations (for future use)- Merge manifest references
- Merge type (normal, integrate, cherrypick)
Core Functions (
pkg/vcs/vfile.go):LoadVFileFromManifest()- Populate vfile from manifest during checkoutCheckVFileSignatures()- Detect file changes by comparing hashesWriteVFileToDisk()- Write files from vfile to diskGetVFileEntry(),GetCurrentCheckout(),SetCurrentCheckout()- Helper functions
VCS Integration:
Checkout()- Populates vfile table and sets current checkout configAdd()- Inserts/updates vfile entries for added filesScan()- Queries vfile table instead of full directory scanCommit()- Reads changed files from vfile and updates to new baseline
Test Coverage
- 8 unit tests for vfile core functions
- 3 integration tests for complete workflows
- All 31 VCS tests passing
See doc_cambria/CAMBRIA_VFILE_IMPL.md for detailed implementation documentation.
Next Steps: Phase 2 (CLI & User Interface)
Current Task: Implement doc_cambria/CAMBRIA_CLI_PLAN.md
cmd/cambria/- A CLI to expose the VCS operations.- Command implementations:
init,add,commit,checkout,diff,status, etc. - Command-line argument parsing and user-friendly output.
Key Fossil → Go Mappings
| Fossil Module | Cambria Package | Responsibility |
|---|---|---|
src/content.c |
pkg/store/blob.go |
Content storage |
src/manifest.c |
pkg/artifact/manifest.go |
Manifest parsing and generation |
src/db.c |
pkg/store/ |
Database operations (via DBTX) |
src/checkin.c |
pkg/vcs/checkin.go |
High-level commit creation |
src/checkout.c |
pkg/vcs/checkout.go |
High-level checkout to filesystem |
src/add.c |
pkg/vcs/add.go |
File addition to version control |
src/diff.c |
pkg/vcs/diff.go |
Diff computation |
src/vfile.c |
pkg/vcs/vfile.go |
VFILE system for working directory tracking |
src/vfile.c |
pkg/vcs/workdir.go |
Working directory scanning (uses vfile) |
Critical Reminders for Agents
- Use the Repository API: Do not call
pkg/storefunctions directly for VCS operations. Use the methods on thevcs.Repositorystruct. - Embrace Transactions: When adding new multi-step database logic, use the
tx, err := repo.DB().Begin()pattern. - CGo is Required: The SQLite driver requires CGo. Ensure
CGO_ENABLED=1. - Test Everything: All new functionality must be accompanied by tests in the same package. Use the
setupTestRepohelper inpkg/vcsfor integration-style tests. - Keep It Simple: Follow the YAGNI principle. Don't add abstractions unless necessary.
- Content-Addressed: Remember that a blob's UUID is always its SHA-256 hash.
- Immutable Blobs: Never modify existing
blobrecords. - Path Security: Always validate user-provided file paths to prevent directory traversal attacks. Use the validation functions in
checkout.goandadd.goas examples. - Modern Go APIs: Use
os.*functions instead of deprecatedioutil.*functions. - Dependencies: The project uses minimal external dependencies. Currently only
github.com/mattn/go-sqlite3(SQLite driver) andgithub.com/sergi/go-diff(diff library) are used.
Development Commands
# Build all packages
go build ./...
# Run all tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Run tests with race detector (CRITICAL before submitting)
go test -race ./...
# Format code
go fmt ./...
# Static analysis
go vet ./...