Why AI Companies are Licensing Publisher Data
Big Tech doesn’t just need more data. It needs the right kind.
Last week, Digiday reported that Amazon signed licensing deals with Condé Nast and Hearst to leverage publisher data in its AI chatbot, Rufus.
It’s a sign of the times for publishing: former journalism heavyweights have been knocked down and are struggling to get back on their feet after their business model took a sucker punch. Publishers like The New York Times and The Atlantic have made a killing selling ads online, as brands chased the massive traffic these sites pulled in. But a lot has changed, with Business Insider seeing a near 50% drop in monthly site visitors from early 2023 to mid-2025.
How did this happen? It started with OpenAI introducing ChatGPT, giving people a new way to search and discover information. Seeing this as a threat, Google proceeded to roll out the “AI Overview” feature, which provides an output at the top of your search page with an attempted answer to your query. Launched in mid-2024, these “AI Overviews” were sloppy and a poor user experience. But as Google finetuned these queries over time, the AI Overview ended up providing good enough answers, in a ChatGPT-like format.
Which means no scrolling through the blue links. No scrolling means fewer chances to discover publisher content (unless they show up in the AI Overview). Fewer clicks on publisher links means lower traffic, and that won’t make advertisers happy.
So yeah, news companies don’t have much choice but to strike licensing deals with Big Tech to offset declining ad revenue. Understandable, but still an existential threat to publishers all around.
However, what caught my eye in the Digiday article: “Condé Nast – home to Vogue, GQ and Vanity Fair titles – and Hearst have years of SEO-optimized, structured content — the kind of clean, consistent, high-quality text ideal for AI training.”
Prior to the AI boom, publishers were maniacally focused (and still are) on showing up top of Google search. SEO was their bread and butter, a skill they mastered.
See, ranking in LLMs is different from SEO, but clean data structures still matter. That makes publisher content still relevant to as it has good bones.
I’m curious if we’ll see a wave of front-end (chatbot) licensing deals emerge to strengthen recommendation systems. Think of how helpful all of those “My Top 10 Skincare Products” or “What Type of Grill Should I Buy for Summer BBQs?” articles will be in matching shopper intent to products. Ask Rufus for a breathable golf polo, and it’ll likely pull from a “Top 10 Golf Shirts in 2025” article.
Not only are these articles relevant to the transactional intent of someone using Rufus, but they also have clean metadata for AI companies to train on. Publishers should lean less on how many people read their site, and more on how many algorithms crawl it. Sure, this clashes with the ethos of journalism. But times are changing as publishers need to realize that AI is a growing share of their customer base.
Target, Walmart, and other retailers will likely launch their own chatbots to guide purchases, and they’ll need publisher partnerships to train them. The business case for integrating media (publishers) with commerce (retailers) is so clear and one that the publishers will need to hold on as tight as possible for the sake of survival. Commerce media is thriving, and monetization models and deal structures are evolving fast.
Hold on, this wouldn’t be a Relentlessly Curious post if I didn’t mention something related to consumer brands. If publishers are licensing their data to Big Tech, brands have a window to get in too. If a Good Housekeeping article mentions your brand’s product and Amazon’s Rufus is specifically training off this data, it’s not farfetched to believe that your brand will show up more often in Rufus than your competitors who don’t happen to work with Good Housekeeping. I don’t know how this changes the economics of brand and publisher partnerships but stay close to your affiliate network as it could help you rank in LLMs.
Amazon is a strong partnership play for the likes of Condé Nast or Hearst, but I think publishers will get more long-term value by licensing to the LLM companies (OpenAI, Perplexity, etc.) directly. With Perplexity launching their Comet web browser last week and an OpenAI web browser in development, the way humans interact with the internet is going to change.
These AI-native browsers will have their chatbot (ie ChatGPT) interfaces and AI agents central to their operations. The closer publishers are to the companies creating a new digital landscape, the better so they can refine their content and strike deals to help keep themselves in business. Don’t get me wrong though, media powerhouses like Disney, Hearst, and The Financial Times have all cut deals with OpenAI, Perplexity, and Claude to license their data.
As I mentioned in We’re Getting a New Browser, AI-powered browsers are changing how we interact with the web. Every content-driven digital business needs to ask themselves who their customer is today, and who their customer will be five years from now. If you aim to be ahead of the curve, you must consider how AI may become a customer of your business. The publishing industry has been hit especially hard by AI progression, but that doesn’t mean your company needs to be on their backfoot. Find a way for AI to become a customer before it becomes a rival.