ICLR 2026 – Institutional Affiliations Dataset and Analysis

May 15, 2026 · books · Source ↗

TLDR

PDF-derived affiliation dataset for all 5,356 ICLR 2026 accepted papers, with full scrape-to-treemap pipeline and CSV/XLSX downloads.

Affiliations pulled from paper title-block PDFs (94% success), not OpenReview profiles, avoiding profile-drift where a current employer overwrites historical affiliations.
Dataset columns include canonical institution names (250+ normalization rules), country, region, primary area, keywords, abstract, and OpenReview URL.
Three counting methods provided: unique-per-paper, first-author-only, fractional 1/N; sensitivity CSV shows top-50 institutions are stable across all three.
Full pipeline is reproducible for other conferences: scrape OpenReview, bulk-download PDFs (~5 GB), parse, canonicalize, render treemap.
Hong Kong institutions counted separately from mainland China, matching QS/THE ranking conventions.