Book corpus datasets used to train large language models
Price
$1,667 per book title annually
Date
2024
Buyer
Microsoft
Seller
HarperCollins
Type
DATASET_LICENSING
Region
Global
Market Context
HarperCollins set a baseline price of about $1,667 per book per year for AI training data licensing.
Term
Per-title license
(Annual)
Confidence:
Medium-High
Citation:
DatFlash (2026).
"Book corpus datasets used to train large language models"
https://www.datflash.com/transaction/microsoft-harpercollins-book-corpus-datasets-used-to-train-large-language-2024
https://www.datflash.com/transaction/microsoft-harpercollins-book-corpus-datasets-used-to-train-large-language-2024