A live tracker documents 110 DMCA takedown notices UK Biobank filed against GitHub repos where researchers accidentally published health data on 500,000 volunteers.
Key Takeaways
UK Biobank has targeted 197 repositories across 170 developers in 14+ countries; the US and China account for the largest shares.
Nearly half of targeted files are Jupyter or R notebooks; a quarter are genetic/genomic formats (PLINK, BOLT-LMM, BGEN) that directly encode participant genotypes.
Re-identification is real: The Guardian matched a volunteer’s record using only approximate birth date and the date of a single major surgery.
Some targeted developers received data secondhand – UK Biobank is filing against researchers it never directly gave access to.
UK has no privacy-breach equivalent of DMCA; Biobank is repurposing copyright law as the fastest available removal mechanism.