DatFlash Logo
DatFlashTM tracks real-world dataset transactions and supply signals, normalized for AI decision-making.

Book corpus datasets used to train large language models

Price
$1,667 per book title annually
Date
2024
Buyer
Microsoft
Seller
HarperCollins
Type
DATASET_LICENSING
Region
Global
Market Context
HarperCollins set a baseline price of about $1,667 per book per year for AI training data licensing.
Term
Per-title license  (Annual)
Confidence: Medium-High
Citation: DatFlash (2026). "Book corpus datasets used to train large language models"
https://www.datflash.com/transaction/microsoft-harpercollins-book-corpus-datasets-used-to-train-large-language-2024
Download JSON