Mounting tar archives as a filesystem in WebAssembly

· web systems coding · Source ↗

TLDR

  • Generate a byte-offset JSON index for a .tar.gz archive and mount it directly into Emscripten’s WORKERFS, skipping extraction and memory copying entirely.

Key Takeaways

  • tar-vfs-index npm package reads a tar or tar.gz stream and outputs a JSON file in file_packager metadata format with start/end byte offsets per file.
  • WORKERFS serves file reads by slicing the backing Blob on demand, so the decompressed tar stays in memory as the random-access store with zero copy into the Wasm heap.
  • Browser-native DecompressionStream('gzip') handles gunzip efficiently before mounting; the decompressed Blob feeds directly into the WORKERFS mount call.
  • The index can be appended as an extra tar entry to the original archive, producing a self-contained .tar.gz that needs no separate .json sidecar file.
  • WebR uses this in production for all binary R packages: packages now load faster while remaining hosted as plain .tar.gz files on static servers.

Hacker News Comment Review

  • One commenter pointed to Ratarmount as a native-filesystem parallel: it builds an offset index over .tar files for random-access reads without decompression costs, the same core idea applied outside the browser.

Notable Comments

  • @sillysaurusx: flags Ratarmount as prior art for index-based random-access tar mounts on the desktop, validating the general approach beyond Wasm.

Original | Discuss on HN