The Third Hard Problem

· databases · Source ↗

TLDR

  • Mapping a general graph onto a hierarchical tree structure is a third fundamental hard problem in CS, alongside naming and cache invalidation.

Key Takeaways

  • Tree mapping appears across file systems (Linux package shredding vs. macOS bundles), code repos (component vs. language layout), writing, city planning, and biological taxonomy.
  • No canonical tree mapping exists for webs of ideas or relationships; every choice distorts the underlying graph structure.
  • Google’s monorepo (Blaze/Bazel, Pants, Buck) is a concrete attempt to escape forced taxonomies in large codebases.
  • Christopher Alexander’s 1965 “A City is not a Tree” is cited as an early formal diagnosis: designed cities fail because they impose tree structure on semilattice human relationships.
  • Cladistics over morphological taxonomy illustrates that preserving real connections beats imposing artificial hierarchies.

Hacker News Comment Review

  • Commenters converged on a unifying framing: tree mapping is fundamentally a dimensional reduction from high-dimensional graphs, and aggregability is NP-Hard per a cited ACM proof, giving the problem formal teeth.
  • There is practical consensus that flat structures often beat forced hierarchies; commenters linked to posts on flat Rust workspaces and flat module layouts as concrete alternatives to deep trees.
  • One commenter argued all three “hard problems” collapse into one: deciding whether two things are the same thing or separate, framing naming, caching, and tree mapping as identity problems.

Notable Comments

  • @efitz: Names this the “one true taxonomy” problem – stakeholder rooms never converge because every hierarchy privileges one classification dimension over others.
  • @chaboud: Grounds the difficulty formally: aggregability is NP-Hard, so any tree reduction of a graph is computationally intractable to optimize.

Original | Discuss on HN