UK Biobank health data keeps ending up on GitHub

· coding privacy devtools · Source ↗

TLDR

  • A live tracker documents 110 DMCA takedown notices UK Biobank filed against GitHub repos where researchers accidentally published health data on 500,000 volunteers.

Key Takeaways

  • UK Biobank has targeted 197 repositories across 170 developers in 14+ countries; the US and China account for the largest shares.
  • Nearly half of targeted files are Jupyter or R notebooks; a quarter are genetic/genomic formats (PLINK, BOLT-LMM, BGEN) that directly encode participant genotypes.
  • Re-identification is real: The Guardian matched a volunteer’s record using only approximate birth date and the date of a single major surgery.
  • Some targeted developers received data secondhand – UK Biobank is filing against researchers it never directly gave access to.
  • UK has no privacy-breach equivalent of DMCA; Biobank is repurposing copyright law as the fastest available removal mechanism.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN