← All articles
AI Policies, Regulations & Strategies · 21 Feb, 2025

Thomson Reuters v. Ross Intelligence: A Landmark Ruling on Copyright Infringement in AI Training Data

AI development raises questions about whether using copyrighted works for training constitutes fair use, a key issue addressed in the Feb 2025 Thomson Reuters v. Ross Intelligence ruling.

Thomson Reuters v. Ross Intelligence: A Landmark Ruling on Copyright Infringement in AI Training Data

Thomson Reuters v. Ross Intelligence: A Landmark Ruling on Copyright Infringement in AI Training Data

  • AI development raises questions about whether using copyrighted works for training constitutes fair use, a key issue addressed in the Feb 2025 Thomson Reuters v. Ross Intelligence ruling.

  • This case, a major U.S. legal precedent, found Ross Intelligence's unauthorized use of Westlaw content for AI training to be copyright infringement.

  • The ruling narrows the scope of fair use, prioritizing market competition, and strengthening the position of rights holders in the AI ecosystem.

  • It emphasizes the economic effects of AI products rather than technical transformations, influencing ongoing lawsuits against AI developers and potentially accelerating licensing agreements.

  • The decision clarifies how fair use applies to AI, focusing on market impact and redefining the balance between innovation and intellectual property protection.

AI, Copyright, and Fair Use: Legal and Industry Implications from the Thomson Reuters Case

Case Background and Legal Analysis

Factual Background: Thomson Reuters has long dominated the legal research industry with its Westlaw platform, renowned for its curated “headnotes” and the West Key Number System. Ross Intelligence, founded in 2014, sought to develop a more affordable AI-powered legal research solution. Court records indicate that Ross hired a third party (LegalEase Solutions) to systematically copy Westlaw’s proprietary headnotes, providing a shortcut in building a product directly competing with Thomson Reuters without incurring similar editorial costs.

By targeting Westlaw’s carefully constructed classification system, Ross was able to rapidly enhance the capabilities of its AI model and challenge Westlaw in the legal research market. The court determined that Ross’s copying was deliberate and extensive, undermining any argument that it was merely incidental or minimally necessary for research.

Fair Use Analysis: Judge Stephanos Bibas applied the four-factor fair use test and concluded that Ross’s actions did not qualify for protection. Under the first factor—purpose and character of the use—Ross’s commercialization and direct competition with Westlaw weighed strongly against a finding of fair use. While acknowledging the innovative aspects of AI, the court reasoned that Ross’s main goal was to shortcut Thomson Reuters’ editorial investment rather than transform the underlying content.

Regarding the nature of the work (second factor), the court noted that while judicial opinions themselves are in the public domain, the headnotes and classification frameworks represent original editorial contributions protected by copyright. On the amount and substantiality used (the third factor), Ross had copied large portions of the headnotes verbatim. Most critically, with respect to market impact (the fourth factor), Judge Bibas found that Ross’s tool directly threatened Thomson Reuters’ market share, concluding that Ross’s free-riding outweighed any purported transformative value.

Distinguishing Generative vs. Non-Generative AI: The court also distinguished between generative AI systems that produce new content and retrieval-oriented systems. Ross’s tool did not create new legal analysis but used Thomson Reuters’ editorial content to improve search capabilities, operating in the same market segment. Judge Bibas noted that even a genuinely generative system could face similar legal liability if its outputs served as a market substitute. The deciding factor, according to the ruling, is whether AI outputs harm the copyright holder’s market rather than the technical details of how the AI model is trained.

Implications For Generative AI Companies

Narrowing Fair Use for AI Training: The Thomson Reuters ruling challenges the widespread assumption that using copyrighted materials for AI training is automatically transformative fair use. By stressing market harm over technical innovation, the court has set a precedent that large-scale, unauthorized copying of copyrighted content may constitute infringement when the AI product competes directly with or substitutes the original work. This approach places many AI developers—who rely heavily on expansive datasets—at higher legal risk.

Data Acquisition Strategies: AI developers must now carefully vet their training data for copyrighted materials. Unrestricted web scraping and bulk ingestion of proprietary datasets appear increasingly risky. Large companies are seeking formal licensing agreements with content owners, which can be expensive and complex. Smaller AI startups, with fewer resources, may find it difficult to secure such licenses, potentially limiting their competitive edge in a market leaning toward bigger players with robust compliance infrastructures.

Business Model Adjustments: The ruling forces AI companies to rethink how they position and monetize their products. Licensing fees and legal compliance costs will likely rise, benefitting well-capitalized tech giants. Some AI developers may pivot to more specialized niches where data is in the public domain or relatively easy to license. Others might collaborate closely with rights holders, offering revenue-sharing or co-branding arrangements, rather than risking infringement lawsuits.

Implications for Rights Holders

Strengthened Legal Position: The Thomson Reuters decision affirms that copyright holders retain control over how their works are used in AI training. Rights holders can demand licenses, royalties, or conditions restricting the AI’s potential to compete directly with the original content. This is especially impactful for entities that have invested heavily in curated databases or editorial systems, as the ruling clarifies that such structured, original content is protected under copyright.

Monetization Opportunities: By licensing their works to AI developers, copyright holders can diversify revenue streams. From major publishers to niche content creators, rights holders are exploring creative licensing contracts that may include tiered fees, attribution requirements, or restrictions on how the AI outputs are used in the marketplace. This opens new economic avenues, enabling content owners to benefit from the surge in AI-driven innovation.

Market Protection Strategies: Copyright owners are also refining strategies to prevent AI tools from undermining their core markets. Licensing agreements can impose terms that restrict AI systems from fully substituting the original content. Additionally, technology solutions like watermarking, digital fingerprinting, or blockchain-based provenance can detect unauthorized usage. These measures, bolstered by the court’s ruling, give rights holders greater power to protect their market share.

Evolving Legal and Industry Responses

Legislative and Regulatory Developments: In the wake of the Thomson Reuters decision, legislators worldwide are considering how best to regulate AI and copyright. The proposed Generative AI Copyright Disclosure Act in the U.S. would require AI developers to disclose the copyrighted works they use for training. The EU’s AI Act similarly focuses on transparency, aiming to empower rights holders to monitor potential infringements. Policymakers are also exploring collective licensing mechanisms and seeking international coordination through bodies like WIPO.

Industry Adaptation: AI companies are adapting both technically and commercially. Many are deploying advanced filtering tools and “clean room” approaches, training models only on licensed or public domain data. Others are investing in synthetic data generation to minimize reliance on copyrighted works. On the business side, companies are hiring teams of licensing experts, forming industry consortia to develop standardized practices, and actively pursuing negotiated agreements with rights holders to avoid protracted litigation.

Future Litigation Landscape: With the Thomson Reuters ruling as precedent, other high-profile cases—such as Getty Images v. Stability AI and lawsuits by the Authors Guild against major AI developers—may follow a similar market-substitution analysis. While defendants will argue that generative outputs are sufficiently transformative, courts are likely to scrutinize their commercial impact. Many cases may settle out of court, but those that proceed to trial will refine the standards established here, further delineating the boundaries of fair use in the AI context.

Our Take

The Thomson Reuters v. Ross Intelligence case is defining the boundaries of using copyrighted content in artificial intelligence (AI) training. The court has ruled that Ross Intelligence's unauthorized use of Westlaw content constitutes copyright infringement. This decision is prompting AI developers to reevaluate their strategies for using copyrighted content. The court's ruling indicates that large-scale AI training on copyrighted works is not automatically considered fair use, and that rights holders can leverage their content to create new revenue streams and protect their markets.

Key Takeaways

  • The ruling indicates that large-scale AI training on copyrighted works is not automatically fair use, especially when the AI system competes with the original market.

  • The court's focus on market substitution, rather than technical transformation, strengthens copyright protections.

  • Even generative AI systems face similar risks if their outputs serve as market substitutes for original works.

  • AI developers must invest in licensing and data filtering to minimize infringement risks.

  • Rights holders can leverage their content to create new revenue streams and protect their markets.

  • Proposed legislation aims to require transparency in AI training, including disclosure of copyrighted data sources.

  • Collective licensing and industry collaboration models are gaining traction.

  • "Clean room" and synthetic data strategies are emerging to reduce reliance on proprietary works.

  • This ruling sets a significant precedent for ongoing and future lawsuits in the AI sector.

  • The balance between AI innovation and intellectual property rights is being redefined.

References

AI Policies, Regulations & Strategies