Google Fined €250M for Scraping French News for Gemini

France’s competition watchdog has fined Google €250 ($271 million) for breaking EU intellectual property laws by using online news content to train its AI chatbot, Gemini.

In a statement on Wednesday, the watchdog said that Google's AI-powered chatbot Bard – which has since rebranded under Gemini – was trained on content from publishers and news agencies without their or regulators' consent.

"Google linked the use of the content concerned by its artificial intelligence service to the display of protected content", the watchdog said, adding that in doing so Google hindered the ability of publishers and press agencies to negotiate fair prices.

The fine arrives amid a copyright dispute in France over tech companies using online content to make large language models (LLMs) – which has sparked complaints from some of the country's biggest news organisations, including Agence France Presse (AFP).

The dispute appeared to be resolved in 2022 when Google dropped its appeal against an initial €500 million fine issued at the end of a major investigation by the Autorite de la Concurrence.

But in Wednesday's statement, France’s competition watchdog said the tech giant violated the terms of four out of seven commitments agreed in the settlement, and was "failing to respect commitments made in June 2022" or negotiate in "good faith" with news publishers on how much to compensate them for use of their content.

Still, it added that Google has pledged not to contest the facts as part of settlement proceedings and had proposed a series of remedy measures for certain shortcomings.

“Neighbouring Rights”

Many publishers, writers and newsrooms are looking to prevent or at least limit tech companies from scraping their online content without their consent.

Google and other online platforms have been accused for years of making billions from news without sharing the revenue with those who gather it.

This led to the EU creating a form of copyright called "neighbouring rights" to tackle this issue, which allows print media to demand compensation from tech companies for using their content.

France has since been a test case for the rules and after initial resistance, Google and Facebook both agreed to pay some French media for articles shown in web searches. But with AI and LLMs now in the mix, the debate was once again fired up.

Google fined 250 million gemini ai — *Google's AI chatbot, Gemini, is allegedly trained on copyrighted news content.*

Over in the US, the New York Times in 2023 sued Microsoft and OpenAI, the creator of the popular AI chatbot ChatGPT, accusing them of using millions of the newspaper's articles without permission to help train chatbots.

OpenAI has also been sued by multiple authors, lawyers, and even comedians for stealing their copyrighted content without their consent to train its AI models.

"Despite established protocols for the purchase and use of personal information, [OpenAI] took a different approach: theft," a class-action lawsuit by a US law firm against the tech giant from last year reads.

“They systematically scraped 300 billion words from the internet, 'books, articles, websites and posts – including personal information obtained without consent. [They] did so in secret, and without registering as a data broker as required under applicable law."

The Future of AI and Copyright

Experts have warned that the method by which AI firms obtain their data may lead to the work of millions of content creators being stolen, raising questions about the future of creative industries and the ability to tell fact from fiction.

Governments around the world are also taking note of the rapid advancement of AI. The EU parliament recently entered the final stages of passing its landmark “EU AI Act” to protect the world against the “unacceptable level of risk” AI could bring.

One article of that act includes a recital on the importance of transparency in ensuring accountability and facilitating the enforcement of copyrights. Specifically, it mandates GPAI providers to draw up and make publicly available a sufficiently detailed summary of the content used for training their models:

“In order to increase transparency on the data that is used in the pre-training and training of general-purpose AI models, including text and data protected by copyright law, it is adequate that providers of such models draw up and make publicly available a sufficiently detailed summary of the content used for training the general-purpose model.

“While taking into due account the need to protect trade secrets and confidential business information, this summary should be generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used.”

This provision is poised to spark controversy but stands as the most potent copyright-related clause in the Act Copyright holders are likely to welcome this development, whereas tech companies may harbour concerns.

The importance of this copyright provision is that it creates a way of identifying content that has been AI-generated, which could affect its copyrightability in jurisdictions that may impose restrictions on AI authorship due to human author requirements.

OpenAI Takes Aim at Google Search With SearchGPT

Beyond the AI Hype: The Challenges AI Brings to the IT Industry

What is a Prompt Engineering? How to Make LLMs Better

What is a Stochastic Parrot? Understanding the Hidden Flaw in LLMs

Data Mastery: Driving Business Growth with MDM and AI Innovations

Breaking Barriers With Accessible Data Visualization

Teradata and The Very Group: Innovating with Analytics to Help Families Get More

Teradata and Brinker: AI and the Cloud Serve Up Sizzling Food

How a Labour Government Will Change UK Tech, According to Experts

Top 10 Best Public DNS Servers for 2024

What is Engagement Farming and is it Worth the Risk?

Meta to Dissolve 'Reality Labs' Division as Layoffs Loom

Top 10 ERP Software and Systems for 2024

Top 10 SD-WAN Providers to Consider in 2024

Top 10 Best DCIM Software Solutions for 2024

Opentelemetry: The Key to Unified Telemetry Data

Secureworks & IDC MarketScape: Worldwide MDR 2024 Vendor Assessment

Secureworks: 10 Security Controls to Reduce Risk

NHS Suffers Blood Shortage as Cyber Attack Disrupts Donations

Why the Cybersecurity Industry Needs Podcasts

1 TB of Disney Data Leaked in NullBulge Cyber Attack

Why did Yik Yak Fail? How the Messaging App Died

What Happened to AltaVista? The Rise and Fall of a Search Pioneer

What is an AI Skeleton Key? Microsoft Warns of New Vulnerability

OpenAI Takes Aim at Google Search With SearchGPT

NHS Suffers Blood Shortage as Cyber Attack Disrupts Donations

What is a Prompt Engineering? How to Make LLMs Better

What is a Stochastic Parrot? Understanding the Hidden Flaw in LLMs

Data Mastery: Driving Business Growth with MDM and AI Innovations

Beyond the AI Hype: The Challenges AI Brings to the IT Industry

Why the Cybersecurity Industry Needs Podcasts

Breaking Barriers With Accessible Data Visualization

Top 10 ERP Software and Systems for 2024

Top 10 MFA Providers and Software Tools for 2024

Top 10 Best Data Quality Tools for 2024

Top 10 SD-WAN Providers to Consider in 2024

Secureworks & IDC MarketScape: Worldwide MDR 2024 Vendor Assessment

Secureworks: 10 Security Controls to Reduce Risk

AuditBoard: Digital Risk Report 2024

AuditBoard: IT Risk and Compliance Platforms

Cybersecurity Luminary Stephen Khan to Receive Prestigious Hall of Fame Award at Infosecurity Europe

Leadership powerhouse Claire Williams OBE reveals how to navigate change and develop a strong team culture at Infosecurity Europe 2024

Digital Transformation Week Unveils Keynote Topics: Empowering Enterprises with Real-World Insights

Generative AI and Deepfake Expert, Henry Ajder to discuss the impact of generative AI on cybersecurity at Infosecurity Europe 2024

"Everyone's Talking About AI in Cybersecurity!" | TG Singham @ Infosecurity Europe

"You Need to Test that Your Backup Plan Actually Works!" | Kim Larsen @ Infosecurity Europe

"There's a Disparity Between Threat Actors and Security Teams" | Koryak Uzan @ Infosecurity Europe

"There's an Information Overload for Cybersecurity Teams!" | James Johnson @ Infosecurity Europe

“Neighbouring Rights”

The Future of AI and Copyright

More from Ellis Stewart

Ellis Stewart

Recommended for you

OpenAI Takes Aim at Google Search With SearchGPT