Tech

Apple’s AI training faces backlash as major publishers opt out


Is Apple’s AI momentum about to crash? A growing number of news outlets and social platforms are saying “no” to the tech giant’s data-hungry web crawlers.

For decades, these digital bots have been quietly collecting information from the internet, feeding it to everything from search engines to AI models. But as AI has become more powerful, the stakes have risen. Now, publishers are drawing a line in the sand, demanding control over their content and challenging Apple’s AI ambitions.

Apple’s web crawler, Applebot, was initially designed to power features like Siri and Spotlight. However, it has recently taken on another big role: gathering data to train Apple’s foundational AI models, or what the company calls “Apple Intelligence”. This data includes text, images, and other content.

To appease publishers, Apple introduced Applebot-Extended, a tool that allows website owners to opt out of AI training. So, while the option exists, many publishers are taking advantage of it. By updating their robots.txt files, they can block Applebot (and other crawlers) from accessing their content.

What is Robots.txt?

Robots.txt is a file used by website owners to control which bots can access their content. Publishers are increasingly using it to block AI bots from scraping their websites for training data. This is due to concerns about copyright and the potential misuse of their content.While robots.txt is a relatively simple tool, it has become more complex in the age of AI. With the rapid emergence of new AI agents, it can be challenging for publishers to keep their block lists up-to-date. As a result, many are turning to services that automatically update their robots.txt files.

The backlash

Since the robots.txt files are publically accessible, it means that everyone can see which parties are opting out of Apple’s AI training, which is exactly what Wired did.

Turns out that some media outlets like The New York Times, for example, has been outspoken in its criticism of Apple’s opt-out approach. The paper, which is suing OpenAI over copyright infringement, argues that publishers shouldn’t have to opt-out to begin with; instead, a permission must be required for web crawlers to gain access to the media’s content.

Other popular websites that have opted out also include Instagram, Facebook, Tumblr, Craigslist, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast.

So, what’s next? Will Apple be forced to rethink its AI strategy? Or will it find a way to appease publishers and continue its data-driven ambitions? The battle for control of the internet’s digital goldmine is far from over.


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button