Microsoft and HarperCollins sign AI licensing deal, but author opt-in still required

The news: Microsoft signed a deal with HarperCollins to use the book publisher’s nonfiction works for AI model training.

  • The three-year agreement pays each author and HarperCollins $5,000 per book title, which will be split evenly between the two.
  • The licensed content will be used to train a Microsoft model that it hasn’t announced yet, per Bloomberg, and won’t be used to make new AI-generated books.

Authors need to opt in to the training program, and the AI model will be limited to “no more than 200 consecutive words and/or 5% of a book’s text” in its output. In context, 5% of Michelle McNamara’s “I’ll Be Gone in the Dark,” a nonfiction title published by HarperCollins, comes to about 18 pages of text.

Zooming out: The pool of public content for generative AI (genAI) training is running out, which could affect timelines for model improvement.

  • The remaining 300 trillions tokens of public data left for training will be used up between 2026 and 2032, per Epoch AI.
  • Adding books that aren’t in the public domain into the supply of data for model improvement could help push that timeline back.

The obstacle: Getting writers on board with the licensing deal may be difficult.

  • The financial trade-off could be insufficient for authors, especially considering the publisher is taking half of the cut.
  • Author Daniel Kibblesmith said on X (formerly Twitter) that he would consider the program if it offered him enough money to never work again, “since that’s the end goal of this technology.”

Less than half (47%) of US adults trust companies to responsibly prevent their AI models from creating work that’s derivative of other work, per The Verge.

Why this could succeed: Book content could be safer from unauthorized data scraping than news content, since books are less frequently published in full online.

For authors who aren’t opposed to genAI learning from their work, this partnership could offer additional revenue with a concrete limit to the AI’s outputs.

Our take: It isn’t clear what role will be left for human creators in an AI-driven future, but with a finite amount of data left for model training, AI companies are likely to keep pursuing various publishers with lucrative licensing deals.

This article is part of EMARKETER’s client-only subscription Briefings—daily newsletters authored by industry analysts who are experts in marketing, advertising, media, and tech trends. To help you finish 2024 strong, and start 2025 off on the right foot, articles like this one—delivering the latest news and insights—are completely free through January 31, 2025. If you want to learn how to get insights like these delivered to your inbox every day, and get access to our data-driven forecasts, reports, and industry benchmarks, schedule a demo with our sales team.