Version 1, release candidate
The wire format is stable: field names, semantics, and the
/.well-known/pagedigest.json location will not break before 1.0.
A site publishes a JSON manifest at /.well-known/pagedigest.json.
Each URL gets a monotonic integer that moves only when the content moves. A crawler fetches
the manifest, compares integers, and fetches only the pages that changed — one request
instead of thousands.
“If you would stop hammering my server for pages that haven’t changed, I wouldn’t have to rate-limit or ban you.”
“If you would stop hiding behind bot detection, I wouldn’t have to hammer your server — I just need to know what’s new.”
They are describing the same waste. pagedigest is the coordination.
For a 10,000-page site that changes 20 pages a week, a consumer makes one manifest
request plus twenty page fetches per cycle — instead of ten thousand per-URL checks.
site_rev is the one-request
fast path: if it hasn’t moved, nothing has.
{
"version": 1,
"generated": "2026-07-02T10:00:00Z",
"site_rev": 18294,
"entries": {
"/": { "rev": 47 },
"/about": { "rev": 12 },
"/blog/hello-world": {
"rev": 4, // was 3 — this is the only page to re-fetch
"digest": "sha256:2cf24dba5fb0a30e26e83b2ac5b9e29e…"
}
}
}
The optional digest lets consumers spot-audit publisher claims: publishers who lie get caught, publishers who cooperate accumulate trust.
One GET to /.well-known/pagedigest.json. If it’s missing or
malformed, fall back to whatever you did before — the protocol never makes things worse.
site_rev
Same integer as last visit? Nothing on the site changed. The crawl cycle is over after a single request.
For each URL whose rev incremented, re-fetch. Everything else is
provably unchanged, on the publisher’s own word.
Hash a sampled page and compare it to the manifest’s digest.
A mismatch means the manifest can’t be trusted — downgrade and fall back.
dotrepo.org — a public index of repository metadata for AI agents — publishes a pagedigest manifest covering its full corpus: thousands of JSON records that refresh continuously. Mirrors and agent caches sync the whole index with one manifest request plus only the files whose revision moved.
curl -s https://dotrepo.org/.well-known/pagedigest.json | jq .site_rev
pagedigest.org publishes its own manifest at
/.well-known/pagedigest.json,
generated by the reference generator at build time.
pagedigest is a coordination protocol, not a defense mechanism. Each side takes on an obligation that is already in its own interest.
The wire format is stable: field names, semantics, and the
/.well-known/pagedigest.json location will not break before 1.0.
Publishers advertise the manifest with
Link: </.well-known/pagedigest.json>; rel="https://pagedigest.org/rel"
on ordinary responses.
A Rust generator and a Python consumer library exist today, with a conformance vector suite. Public source release accompanies 1.0.
Crawler operators and publishers who want early integration: hello@pagedigest.org.