Fc, a lossless compressor for floating-point streams

· open-source · Source ↗

TLDR

  • fc is a single-file C library that compresses IEEE-754 doubles by running ~50 specialized codecs per block and keeping the smallest result.

Key Takeaways

  • Achieves 3.07x compression ratio on 17 synthetic double datasets vs. zstd-3 at 2.07x; wins ratio on 10 of 17 datasets outright.
  • Encode throughput is ~120 MB/s (multi-threaded, AVX2); decode hits ~1.28 GB/s, making it suited for write-once/read-many time-series stores.
  • Loses to zstd on quantized or dictionary-friendly data (e.g., decimal-cents: zstd-9 3,465x vs. fc 268x) and to fpzip on noisy scientific arrays by small margins.
  • Hard requirements: x86-64 with AVX2 + SSE4.2 + BMI + LZCNT; input must be 8-byte-aligned multiples of 8 bytes; no ARM/NEON path.
  • On-disk format is versioned by magic number but not stable across major versions; unknown mode IDs silently decode to zeros – treat untrusted streams carefully.

Hacker News Comment Review

  • The author clarifies fc is narrowly scoped: float-specific predictors and transforms for time-series, scientific, simulation, and analytics data, not a general zstd/lz4 replacement.
  • A commenter asked whether fc has been compared to Chimp128 or Arrow’s byte stream split encoding, two common alternatives in columnar float compression pipelines – no response yet.

Notable Comments

  • @Scaevolus: asks about Chimp128 and Arrow byte stream split as missing baselines in the benchmark.

Original | Discuss on HN