Content-defined chunking added to Bazel

· devtools · Source ↗

TLDR

  • BuildBuddy’s CDC splits large build outputs into content-defined chunks, cutting uploads 40% and disk cache size 40% on the BuildBuddy repo benchmark.

Key Takeaways

  • Enable with Bazel 8.7 or 9.1+ via --experimental_remote_cache_chunking; chunks blobs larger than 2 MiB.
  • Rolling-hash boundaries (FastCDC algorithm) are content-defined, so only changed chunks need re-upload; unchanged chunks reuse across builds.
  • In production, CDC skipped ~300 TiB of duplicate chunk uploads over a two-week window, with peaks over 4 TiB/hour.
  • Works best for linker outputs (GoLink, CppLink) and uncompressed packages; compressed formats like tar.gz or Docker layers chunk poorly due to byte-stream churn.
  • Implemented across three layers: Remote APIs (SplitBlob/SpliceBlob protocol), BuildBuddy server, and Bazel client-side combined cache path.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN