If you’ve ever posted a comment or post on Reddit, there’s a chance that it will be used as material for training OpenAI‘s AI models after the two companies confirmed that they’ve reached a deal that enables this exchange.
Reddit will be given access to OpenAI‘s technology to build AI features, and for that (as well as an undisclosed monetary amount), it’s giving OpenAI access to Reddit posts in real-time that can be used by tools like ChatGPT to formulate more human-like responses.
OpenAI will be able to access real-time information from Reddit’s data API, software that enables the retrieval of and interaction with information from Reddit’s platform, providing OpenAI with structured and unique content from Reddit. This is similar to an agreement Reddit reached with Google at the beginning of the year, allowing Google to train its own AI models on Reddit’s data, reported to be worth $60 million.
According to the official Reddit blog post publicizing the deal, the deal will help people discover and engage with Reddit’s communities thanks to the Reddit content brought to ChatGPT and other new OpenAI products. Through Reddit’s APIs, OpenAI’s tools will be able to understand and showcase Reddit’s content better, particularly when it comes to recent topics.
Reddit, the company, and Reddit, the community of users
Users and moderators on Reddit will apparently be offered new features thanks to applications powered by OpenAI’s large language models (LLMs). OpenAI will also start advertising on Reddit as an ad partner.
The blog post put out by Reddit also claims that the deal is in the spirit of keeping the internet open, as well as fostering learning and research to keep it that way. It also cites that it wants to continue to build up its community, recognizing its uniqueness and how Reddit serves as a place for conversation online. Reddit claims that this deal was signed to improve everyone’s Reddit experience using AI.
It remains to be seen whether users are convinced of these benefits, but previous changes of this type and scale haven’t gone down particularly well. In June 2023, over 7,000 subreddit communities went dark to protest changes to Reddit’s API pricing for developers.
It also hasn’t explicitly been stated by either company that Reddit data will be used to train OpenAI’s models, but I think many people assume this will be the case – or that it’s already happening. In contrast, it was disclosed that Reddit would give Google “more efficient ways to train models,” and then there’s the fact that OpenAI founder Sam Altman is himself a Reddit shareholder. This doesn’t confirm anything specific and, as reported by The Verge, “This partnership was led by OpenAI’s COO and approved by its independent Board of Directors.”
Official statements expressing the benefits of the partnership
Speaking about the partnership and as quoted in the blog post, representatives from both companies said:
“Reddit has become one of the internet’s largest open archives of authentic, relevant, and always up to date human conversations about anything and everything. Including it in ChatGPT upholds our belief in a connected internet, helps people find more of what they’re looking for, and helps new audiences find community on Reddit.”
– Steve Huffman, Reddit Co-Founder and CEO
“We are thrilled to partner with Reddit to enhance ChatGPT with uniquely timely and relevant information, and to explore the possibilities to enrich the Reddit experience with AI-powered features.”
– Brad Lightcap, OpenAI COO
They’re not wrong, and many people make search queries appended with the word “Reddit” as Reddit threads will often provide information directly relevant to what you’re searching for.
It’s an interesting development, and OpenAI’s sourcing of information – both in terms of accuracy and concerning training data – has been the main topic of discussion around the ethics of its practices for some time. I suppose at least this way, Reddit users are being made aware that their information can be used by OpenAI – even if they don’t really have a choice in the matter.
The announcement blog post reassures users that Reddit believes that “privacy is a right,” and that it has published a Public Content Policy that gives more detail about Reddit’s approach to accessing public content and user protections. We’ll have to see if this will be upheld as time goes on, and what the partnership looks like in practice, but I hope both companies will take users’ concerns seriously.