Meta is scraping your public Facebook and Instagram posts

Key Takeaways

Meta is using Facebook and Instagram content to train AI models
Meta admits scraping public posts, which could include images of children
Currently, only EU users are able to opt out

Have you ever created an AI image and thought that the person in the image looked familiar? Maybe it looked a bit like you or someone you know. If so, that may not have been completely down to chance.

Meta has publicly confirmed that it is using your photos, videos, and messages from both Facebook and Instagram to train its AI models. The company is harvesting public posts from as far back as 2007 to train its AI products, and there’s nothing the vast majority of us can do about it. Currently, only users in the EU have the ability to opt out of this indiscriminate hoovering up of personal content; for the rest of us, the only way to stop it is to make posts private.

The fact that only the EU is able to opt out of this assault on privacy is because, currently, Europe is the only place where there are sufficient laws to force Meta to grant that option. It’s becoming abundantly clear that without legal guidelines, big AI companies simply can’t be trusted to police themselves.

Meta is scraping public Facebook and Instagram posts from as far back as 2007

Only the EU and UK were given the option to opt out

During a public inquiry in Australia looking into AI usage in the country, Melinda Claybaugh, the global privacy director at Meta, admitted that Meta is scraping public posts from Facebook and Instagram users to train its AI products. Australian senator, David Shoebridge, put the following to Claybaugh: “The truth of the matter is that unless you have consciously set those posts to private since 2007, Meta has just decided that you will scrape all of the public photos and all of the texts from every public post on Instagram or Facebook since 2007, unless there was a conscious decision to set them on private. That’s the reality, isn’t it?” Claybaugh’s response was a single word: “Correct.”

“The truth of the matter is that unless you have consciously set those posts to private since 2007, Meta has just decided that you will scrape all of the public photos and all of the texts from every public post on Instagram or Facebook since 2007, unless there was a conscious decision to set them on private.”

While this is likely to be happening not just in Australia but in many countries around the world, there are some countries where that’s not the case. In the EU, from June this year, users were given the ability to opt out of having their content scraped by Meta, thanks to the strong privacy rules in Europe. However, even now, public posts from EU members can be scraped unless they go deep into their privacy settings to deliberately opt out. Many people in the EU may still be unaware that it’s an option at all.

No content was scraped from the accounts of under-18s, however

Meta AI on phone against colored background

Claybaugh confirmed that Meta is only scraping content from the accounts of adults; content is not scraped from the Facebook or Instagram accounts of anyone who is under 18. However, Tony Sheldon, another Australian senator, asked whether photographs from his own adult account that featured his children would be scraped. Claybaugh confirmed that they would.

It was also not possible to rule out the possibility that when scraping the accounts of people who are now over 18, posts would have been harvested that were posted when they were still under that age. Since Meta is scraping as far back as 2007, even people who are currently in their 30s could potentially have images of them when they were under 18 scraped from their accounts.

Meta scraping content that includes images of children under the age of 18 in order to train its AI models is questionable at best. What’s worse is that Meta doesn’t seem to have any issue with this at all, or indeed any real way of stopping it from happening other than to cease scraping completely. There’s no way for users outside the EU to stop it happening to their own accounts, other than making all of their posts private.

Meta isn’t the only company that will be scraping personal content

Anything you post publicly appears to be fair game

Meta may have publicly admitted that it is scraping user content, but you can bet your bottom dollar that it’s far from the only company that is doing so. AI models require vast amounts of data for training, and the more data they have access to, the better they can become. It’s already reached the point where there are concerns that we’re going to run out of real-world data to train AI models with and will have to resort to generating synthetic data instead.

This means that AI companies will hoover up anything that they can if it gives them a competitive advantage. All the way back in July of last year, Elon Musk confirmed during a Twitter Spaces discussion that the company would use public tweets for training it’s AI models, meaning that unless you’ve opted out, your public posts on X will have been scraped to help train Grok AI.

It isn’t the only chatbot to do so, however. During the same discussion, Musk confirmed that he had imposed rate limits on accessing X’s data because “every organization doing AI, large and small, has used Twitter’s data for training.” Musk has beef with OpenAI, having been a co-founder of the company before cutting ties, and he clearly believes that ChatGPT has also been trained using public posts from Twitter/X. It is possible to opt out of allowing Grok to use your posts as training data, but by now that horse has long since bolted; your public post history has almost certainly already been scraped.

AI companies aren’t being completely transparent about what they’re doing

It took two tries just to get Meta to admit what it was doing

Instagram app on phone on colored background

One of the most disturbing things to come out of the inquiry in Australia was just how hard it is to get AI companies to admit to what they’re doing. When Senator Sheldon first asked Melinda Claybaugh whether Meta was hoovering up the data of all Australians to build its generative AI tools, she rejected that claim. Technically, she was right; Meta isn’t hoovering up the data of all Australians, since there are plenty of people who aren’t on Facebook or Instagram.

One of the most disturbing things to come out of the inquiry in Australia was just how hard it is to get AI companies to admit to what they’re doing.

It was only when Senator Shoebridge challenged her response, and asked a question that was specific to the data of Facebook and Instagram users that Claybaugh admitted that it was happening. Meta CEO Mark Zuckerberg has alluded to the company using Facebook and Instagram data in the past, but without being explicit. He said that “the next key part of our playbook is learning from the unique data and feedback loops in our products” before referring to the hundreds of billions of publicly shared images on Facebook and Instagram.

This isn’t quite the same as a direct admission that Meta is scraping your content from as far back as 2007, however. If Elon Musk is right, and in this rare case there’s no reason to think that he’s not, large numbers of AI companies are routinely scraping personal posts and images from social media sites, without a care in the world.

Not every company is riding roughshod over your privacy

The exceptions are rare, however

AI models require data, and the internet is a rich supply. Scraping data from the internet isn’t a new thing; search engines such as Google wouldn’t work without being able to do so. There’s a big difference between scraping keywords from a website and using personal photos to train AI models, however.

Not every AI company is harvesting data without consent. There are companies who at least appear to be trying to do things differently. Apple, for example, uses a web crawler called Applebot to trawl the web for information that can be used by Siri or Safari. It has a separate agent called Applebot-Extended that gives websites control over how their content is used. It’s now possible for sites to add a snippet of code that will deny Applebot-Extended permission to scrape data from that website for the purpose of training Apple’s AI features. In other words, Apple leaves the decision of whether a site’s data is used for training Apple’s AI up to the websites themselves, who can say no without consequences.

Several big websites have taken up the option to block Apple from scraping their sites for training purposes. These include Facebook and Instagram, meaning that none of your personal posts will be used to train Apple’s AI models, even if that’s how Meta are using them.

While this is admirable, it only really kicks the problem down the road, however. Siri will soon have ChatGPT baked in, and Apple has no control over the data that was used to train OpenAI’s models.

The EU has shown that companies will only stop if forced to

Rules need to be put in place to allow us to make our own privacy decisions

Framework Convention on Artificial Intelligence being held by signatories

Council of Europe

There is one ray of hope in all of this. The EU is notorious for having some of the strictest internet privacy regulations in the world. Some of them are well-intentioned but ultimately self-defeating, such as the GDPR regulations that are responsible for those annoying pop-ups asking if you give consent for cookies. The idea is admirable, but the end result is a more frustrating internet in which many people click “Allow” just so they can actually start using the website.

It’s clear that major companies do take the EU seriously, however, since the bloc of 27 countries contains almost 500 million people and represents a significant chunk of the market for tech companies. A perfect example is the EU convincing Apple to finally make the switch to USB-C. Meta was also forced to comply with the EU’s directives by giving users in Europe the option of opting out of having their data scraped for AI training.

Even X, the supposed haven of free speech, has fallen in line with the EU’s rules. The company has agreed to stop using the data from accounts in Europe to train its AI models, although it’s too late to do much about the data that has already been harvested.

It might not be time to pack up and move to Barcelona just yet, however. Tech companies will comply with these laws, but often their way of doing so is to just remove the AI features for EU users altogether. Meta has paused the launch of Meta AI in Europe and Apple Intelligence may not initially be available for EU iPhone users, either. It does seem likely that these features will land in the EU eventually, however, since the market is too big to ignore.

This is the real issue. AI has appeared seemingly out of nowhere and developed at an astounding rate, and governments are still playing catch up.

Ultimately, what is needed are rules that apply across the globe. When asked if the same option open to EU Facebook and Instagram users should be given to Australians, Claybaugh said that the opt-out was only offered in the EU due to the laws in place in that region. Until regulations apply everywhere, companies can keep doing what they want in any country that doesn’t tell them not to. The US, UK, and EU have signed an AI treaty but we’re still a long way from global regulation of AI.

This is the real issue. AI has appeared seemingly out of nowhere and developed at an astounding rate, and governments are still playing catch up. The EU has shown that if the correct laws are in place, major companies can be forced to respect privacy. It’s also proven the flip side, however; unless it’s explicitly illegal, AI companies will try to get away with whatever they can, and privacy be damned.

Trending Products