Bluesky’s privacy problems: Openness vs. user data safety in the age of AI scraping

The news: Social and microblogging app Bluesky is facing increasing scrutiny for data privacy risks associated with its Firehose API (application programming interface) after a Hugging Face employee scraped 1 million public posts for AI research.

Bluesky positions itself as an open and decentralized alternative to X and Meta’s Threads. Its claim is that it isn’t owned by a billionaire and therefore its users aren’t subject to any one company’s whims. 

However, Bluesky’s open data policy makes users’ data vulnerable because public posts can easily be scraped for AI training. 

Weak APIs strike again: According to Salt Security, 95% of businesses faced API security problems in the past year. PayPal’s recent global outage was attributed to an API failure that disrupted services for thousands of users.

  • Bluesky’s response to the scraping incident is that it’s actively exploring consent mechanisms and tools but that it can’t enforce them externally.
  • This means public posts on Bluesky remain accessible to third parties, making them vulnerable to scraping and use in AI data training. 
  • Bluesky’s decentralized nature may be a boon for open and free communication on its platform, but users might balk at its porous data security.

Ownership of social posts, accounts in question: The data security and ownership of user-generated content and information on social platforms could be under more regulatory scrutiny in the short term. 

  • X claimed in an unrelated court filing that X user accounts “are inherently a part of X Corp.’s services and their use. … Users cannot sell, assign, or otherwise transfer such license absent X Corp.’s consent.”
  • These limitations may not be evident to users, which is likely why regulators in the EU are stating that Bluesky is in violation of the Digital Services Act (DSA) for failing to disclose important, required information.

Our take: As regulators sharpen their focus, users are left questioning where their rights end and platform control begins. For social platforms, the debate over data privacy and ownership isn’t just a legal challenge—it’s a fight for user trust.

This article is part of EMARKETER’s client-only subscription Briefings—daily newsletters authored by industry analysts who are experts in marketing, advertising, media, and tech trends. To help you finish 2024 strong, and start 2025 off on the right foot, articles like this one—delivering the latest news and insights—are completely free through January 31, 2025. If you want to learn how to get insights like these delivered to your inbox every day, and get access to our data-driven forecasts, reports, and industry benchmarks, schedule a demo with our sales team.