How the protocol works
Synchronisation is a recursive comparison of two Merkle trees built from file metadata. The client only ever sends the differences.
1 · A tree built from metadata, not contents
Each file's hash is sha256(size : mtime-seconds), derived from stat()
alone, so building the tree reads no file data. A directory's hash is the hash of its sorted
child name:hash lines. This makes the whole tree a Merkle hash: two directories
share a hash iff every file beneath them has identical size and timestamp. Empty
directories and dotfiles are omitted, so the two sides always describe the same thing.
2 · Compare the root, then descend only where it differs
Ask for the root hash
GET /api/v1/tree?path=&depth=0, if it equals the local root hash, nothing
changed: done in a single request.
Descend the differences
GET /api/v1/tree?path=<dir>&depth=1, for each directory whose hash
differs, fetch one level and compare child by child. Identical subtrees are skipped.
Push new & changed files
PUT /api/v1/file?path=<rel>&mtime=<ms>, streamed with
backpressure, so a multi-GB file stays near-constant in memory. The server applies the sent
mtime so the hashes match next time.
Remove what's gone
DELETE /api/v1/file?path=<rel>, anything on the server but absent locally
is deleted; a directory delete removes its whole subtree and prunes empty parents.
3 · Why it stays correct and cheap
- No caches anywhere. Hashes are recomputed from
stat()each run, on both sides. The client is stateless; the server keeps nothing, even after a restart the recomputed hashes still match, because the upload preserved the timestamps. - Quick-check semantics. Like rsync's default, a change is detected by a difference in size or whole-second mtime. Editing a file updates its mtime, so real edits are always caught.
- One small surface. A bearer token guards every call; there is no dashboard, login, session, or cookie to attack.
- Status codes say nothing. Every authenticated call returns
200with an object, even "that path is absent". Anything else returns an identical404, so a probe can't tell an API is there at all. - Versioned, so skew is loud. The routes are
/api/v1/…; a version mismatch hits the 404, then the client checksGET /api/versionto say "upgrade the client" instead of failing silently.
4 · Access logs, mirrored incrementally
Caddy writes one rotating access log. The client mirrors it by timestamp,
GET /api/v1/logs lists sizes and rotated-file timestamps;
GET /api/v1/logs/file?which=active fetches only the newly-appended tail via an HTTP
Range. A new rotated file signals the active log restarted. Filenames never cross
the wire, so the server can't influence where the client writes.
The endpoints
| Call | Purpose |
|---|---|
GET /api/v1/tree | Merkle node at a path/depth |
PUT /api/v1/file | Upload a file + apply its mtime |
DELETE /api/v1/file | Delete a file or a whole site |
GET /api/v1/logs | List active size + rotated timestamps |
GET /api/v1/logs/file | Stream a log (Range supported) |
GET /api/version | Server's API version (skew check) |