ICLR 2026 – Institutional Affiliations Dataset and Analysis

· books · Source ↗

TLDR

  • PDF-derived affiliation dataset for all 5,356 ICLR 2026 accepted papers, with full scrape-to-treemap pipeline and CSV/XLSX downloads.

Key Takeaways

  • Affiliations pulled from paper title-block PDFs (94% success), not OpenReview profiles, avoiding profile-drift where a current employer overwrites historical affiliations.
  • Dataset columns include canonical institution names (250+ normalization rules), country, region, primary area, keywords, abstract, and OpenReview URL.
  • Three counting methods provided: unique-per-paper, first-author-only, fractional 1/N; sensitivity CSV shows top-50 institutions are stable across all three.
  • Full pipeline is reproducible for other conferences: scrape OpenReview, bulk-download PDFs (~5 GB), parse, canonicalize, render treemap.
  • Hong Kong institutions counted separately from mainland China, matching QS/THE ranking conventions.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN