Microsoft has quietly pulled its database of 10m faces, which has been used to train facial recognition systems around the world, from the internet. The database, known as MS Celeb, was published in 2016 and described by the company as the largest publicly available facial recognition data set in the world, containing more than 10m images of nearly 100,000 individuals.
The people whose photos were used were not asked for their consent, their images were scraped off the web from search engines and videos under the terms of the Creative Commons license that allows academic reuse of photos.
Microsoft’s MS Celeb data set has been used by several commercial organizations, according to citations in AI papers, including IBM, Panasonic, Alibaba, Nvidia, Hitachi, Sensetime and Megvii. Microsoft itself has used the data set to train facial recognition algorithms.
“Microsoft has exploited the term ‘celebrity’ to include people who merely work online and have a digital identity,” said Adam Harvey, a Berlin-based researcher, who uncovered the data for his project Megapixels. Tech experts said Microsoft may have been in violation of the EU’s General Data Protection Law by continuing to distribute the MS Celeb data set after the regulations came into effect last year.