After a report revealed that numerous companies relied in part on YouTube video transcription data to train their AIs, Apple is stepping forward to clarify its use of and plans for OpenELM, which was trained on the controversial Pile data.
Apple contacted TechRadar after reading the report detailing how the company that provided Pile, EleutherAI, apparently used the YouTube Subtitles data set, an act that would be counter to the social video platform’s data use policies.
While not speaking directly to the issue of YouTube data, Apple reiterated its commitment to the rights of creators and publishers and added that it does offer websites the ability to opt out of their data being used to train Apple Intelligence, which Apple unveiled during WWDC 2024 and is expected to arrive in iOS 18.
The company also confirmed that it trains its models, including those for its upcoming Apple Intelligence, using high-quality data that includes licensed data from publishers, stock images, and some publicly available data from the web. YouTube’s transcription data is not intended to be a public resource but it’s not clear if it’s fully hidden from view.
Just for research
Apple also builds research models and that’s essentially what OpenELM is, a tool for learning more about language models. In a paper on OpenELM (PDF), researchers note that they did train it on Pile data.
Apple says, however, that OpenELM is for research purposes only and it’s not used to power AI features in any Apple devices, which would include, among other things, the best iPhones, best iPads, and best Macs. What’s more, it appears OpenELM’s moment in the sun is almost done. Apple told us it has no plans to build future versions of the model.
While all this may offer some solace to the YouTube creators (including TechRadar) whose data was scrapped for Pile and used in, among other models, Apple’s OpenELM, it does not address the fact that EleutherAI apparently did the scraping without YouTube or the creators’ permission and then handed it to companies like Apple.
What remains to be seen is what YouTube does next. For now, though, Apple’s made it clear that it was one and done with OpenELM and that data will never be a part of Apple Intelligence.