I have compiled a non-comprehensive list of content licensing deals that have been publicly reported, with a particular focus on interesting details disclosed in SEC filings and earnings transcripts. If I have missed any deals, particularly from public companies that mention such deals in earnings transcripts, analyst days, or SEC filings, please email me and I will add them.
Reddit
Shutterstock
Existing deals with Meta and a 6-year3 deal with OpenAI.
$25-50M deals with Amazon, Apple4 based on statement by CFO
The relevant segment, Data Distribution and Services5, grew from $15.9M in ‘21 to $137M in ‘23, suggesting very roughly ~$100M of already recognized revenue from generative AI licensing.
Shutterstock CEO referenced existing deals with OpenAI and Meta and expressed desire to expand licensing to broader audience in Data Marketplace like Snowflake, AWS, etc.6
Yelp
Perplexity reportedly licensed data from Yelp7, though it remains unclear if this deal is materially different from other pre-generative AI deals Yelp enters into to distribute reviews, restaurants, etc.
Yelp reports data licensing in its Other category8 of ~$47M but this includes other types of revenue. This number jumped from ~21M in ‘20 to 47M in ‘239 which suggests a generate AI bump of ~25M.
Reuters
Added $22M in the Reuters News Segment10, the majority of which was apparently driven by “transactional” content licensing for artificial intelligence. This increased News Segment margin by 6.5%, so we can surmise this incremental revenue was largely content licensing.
Onetime fee of $23M for previously published academic articles and books11. The CEO expressed interest in finding more such deals. Wiley owns both academic journals and a large boom publishing business.
Licensed ~200 million images at 2 to 4 cents per image, suggesting ~6M/license to at least two AI firms12
Axel Springer, Associated Press, Group Le Monde, Prisa
Associated Press was among the first organization to announce licensing data13
Multi-year contract for ongoing and historical access to Le Monde corpus with OpenAI14 announced simultaneously with Prisa deal15
I have not been able to find specific financial details for any of these, but The Information reports16 OpenAI offers 1-5M per corpus whereas Apple offers 50M over a multi-year period
Previously, for display (not generative AI purposes), WSJ reported Meta was offering publishers 3M17
X (formerly Twitter)
Well covered that Firehose access will cost 42K/month or 2.5M per year although it is not quite clear what rights/use case this level of access comes with. While largely reported at priced too high, a 2.5M price would seem reasonable or even cheap relative to the above.
StackOverflow
Deal with Google Gemini that also includes workflow integration with Google Cloud console18
Photobucket
Negotiating contracts at “5 cents and $1 dollar per photo and more than $1 per video”. With 13 billion photos (an order of magnitude more than Shutterstock), this could be a significant revenue source but we would have to assume the rights and usefulness of the entire content library are not as robust.
Automattic (Tumblr & Wordpress)
OpenAI and Midjourney apparently licensed, or at least evaluated, data from both Tumblr and Wordpress19 though I was unable to find any specific financials.
NewsCorp
Owner of Fox News, NY Post, among others is reported to be near data deals20 and expects to be “core content provider” based on last earnings call21.
Based on earnings call comments, deal is is in negotiation as of Februry ‘24 but given CEO’s compliment to Sam Altman’s approach, it is reasonable to guess OpenAI is involved.
Who is missing?
I was unable to find any specific data deal references from TripAdvisor, TikTok, or SoundCloud, which all seem like obvious candidates for data deals. Quora also has been involved in launching AI products22 bit I have found no obvious data deal disclose.
NY Times23 and IAC24 have taken a more combative stance, using the courts to protect IP rather than negotiating data deals. Other companies, like Thomson Reuters engages in data licensing but has taken selective court action25.
Other companies such as RELX Group (owners of LexisNexis and Elsevier) have announced AI products26 and workflow integrations with Microsoft27 but I have not found an outright content for training licensing deal. I have anecdotally noticed similarly situations with DaaS providers like Factset, Bloomberg, and S&P Global — although it is difficult to discern if any of their licensing deals are explicitly for AI training given they are in the business of content licensing in the first place.
Reddit S-1 disclosure
Press Release from Shutterstock
Segment definition in latest Shutterstock 10-K
Earnings call transcript
Definition of Other Segment that includes Other Partnerships sub-segment that includes the relevant revenues.
Thomson Reuters Earnings Call transcript
Disclosed by Interim Wiley CEO in prepared remarks in Earnings Call
Associated Press own coverage of the deal
Announcement on Le Monde website
OpenAI blog post mentioning Le Monde and Prisa
NY Post (owned by Newscorp) own reporting on deal
Newscorp earnings transcript
NY Times own coverage of a lawsuit.
Amazing! Thanks for sharing.