The Problem with Copyright and Generative AI Ethics

With so much uncertainty over copyright infringements, creators are on high alert, concerned that Large Language Models (LLMs) and Generative AI will displace them.

Oct 11, 2023

This week’s post comes from Amy McCloskey Tobin, a B2B Content Marketing Expert, GenAI-obsessed storyteller, and collaborative Executive Leader with over 10 years of experience driving content strategy and production in the tech sector. She is currently the VP of Content Strategy for Arria, a global leader in natural language technology.

Everyone watching the Generative AI space, which is much of the world since ChatGPT was released, knows that Open AI is facing multiple lawsuits around copyright infringement. Because there is no clear legal precedent and so many questions about detecting and stopping generative AI copyright infringement, no one is certain how this will play out. The EU, as usual ahead of the U.S. on these matters, has created the AI Act, but that act is focused on safety rather than protecting creators.

With so much uncertainty, creators are on high alert, concerned that Large Language Models (LLMs) and Generative AI will displace them.

The Lay of the AI Lawsuit Land

Open AI, the developer of ChatGPT, faces numerous legal challenges concerning copyright infringement due to its training on copyrighted articles, songs, etc. Here are the cases to watch:

Sara Silverman got our attention early

The comedian was the first big name to come out swinging when she filed against OpenAi in July. Silverman claims ChatGPT and Meta used pilfered books without permission to generate text and is suing for financial damages and permanent injunctions.

Class Action Lawsuit Number 1

The Authors Guild, including many prominent writers, is part of a class action lawsuit alleging that "at the heart of these [OpenAI's] algorithms is systemic theft on a massive scale." The filing continues, "Defendants could have 'trained' their LLMs on works in the public domain.”

“They could have paid a reasonable licensing fee to use copyrighted works. What Defendants could not do was evade the Copyright Act altogether to power their lucrative commercial endeavor, taking whatever datasets of relatively recent books they could get their hands on without authorization.”

Class Action Lawsuit Number 2

In early September, Reuters reported that Open AI and Microsoft, its primary backer, are being sued by two software engineers who allege the company used stolen personal information from internet users to train their Generative AI platform.

Paul Tremblay and Mona Awad suit

The two authors accuse OpenAI of using their books without permission to train ChatGPT, which can produce highly accurate summaries.

The Mark Walters suit

Radio show host Mark Walters is suing Open AI for defamation after ChatGPT falsely said he was guilty of fraud and embezzlement from the Second Amendment Foundation. He is seeking punitive damages.

These are just a few of the remarkable suits against OpenAI — there are many more.

How will these copyright lawsuits play out?

Open AI and Meta do not deny that their platforms used books, articles, and information from the internet to train their models on generating narratives and texts.

Claimants such as Silverman and those in the Author's Guild suit have a stronger claim to demand compensation than the "ordinary people" the Clarkson Law Firm represents. The Clarkson suit argues that any human who wrote any text ingested by a Generative AI platform has the right to compensation since the tech companies use their words to profit.

The opposing argument is what some people consider common sense: When you put something on the internet, particularly on social media, do you have any rights to privacy or ownership? Our justice system will answer this question, but expect the answer to come after numerous and lengthy court battles. Open AI is building a powerful in-house legal team and is ready for this fight.

Is Generative AI Ethical?

These lawsuits must determine if ingesting copyrighted content to train these LLMs constitutes fair use under U.S. copyright law. OpenAI's stance is that, by using copyrighted material, they are providing a "public benefit." These lawsuits are founded on the belief that Open AI is stealing copyrighted work to make a profit and will make it impossible for creatives to earn a living.

Initially, I thought the answer was obvious: Open AI is wrong — using the original copyrighted works was theft.

Then I encountered Reid Blackman on LinkedIn. Reid is an AI Ethics Advisor and a Ph.D. His question about this issue stopped me in my tracks:

“If someone buys my book and uses it as a doorstop, that's ethically fine. Did I consent to them using it that way? No. But I don't see why that matters; it's their book.”

He continued with:

“You can often hear the influence of Jimi Hendrix in Prince's music. Should Jimi be upset? ‘That's my sound in there! I didn't consent to that! I deserve a cut of Prince's sales and a say in governing his music catalog!’”

On Reid's first question, we can agree that using a book as a doorstop does not equate to using that book's original content to make money. However, the influence of Jim Hendrix on Prince's music demands more consideration.

Is Open AI using authors' works to create other derivative works that are, on their own, original? The answer is that what Prince did and what OpenAI is doing is different. Prince was influenced by Jimi but created his own style of music. He did not mimic Jimi.

Generative AI does not deliberate. Prince did. Prince created his sound based on many influences. Open AI took a creator's product and faked being that exact creator.

Silverman's lawsuit has changed things a bit. When I asked ChatGPT to give me a paragraph from The Bedwetter, it told me it could not, but when I asked for a copy of William Butler Yeats's “When You Are Old,” it produced it verbatim. This isn't illegal because the poem was created before 1928 and is in the public domain in the U.S.

I then asked Chat GPT to give me a paragraph from a Brene Brown book; it said:

"Sorry, I can't do that. Would you like a summary or some information about Brené Brown's work instead?"

My following prompt was:

“Can you write something in the style of Bene Brown?”

The answer was:

“Certainly! In the spirit of Brené Brown's work, here's a reflection on the power of vulnerability:
"Vulnerability is the birthplace of connection and belonging. It's where courage and authenticity intersect, inviting us to show up as our imperfect, beautiful selves. When we embrace vulnerability, we open ourselves to the messy, uncertain journey of life. It's in those moments of discomfort and uncertainty that we find our true strength and resilience..."

This isn't an example of the LLM creating something that Bene Brown influenced; it is instead mimicking her style. It is akin to a forger recreating a masterpiece and selling it as an original.

The more important question may be: Whose fault is it that the LLM-generated text mimics Bene Brown? Is it OpenAI's for ingesting so much of her work that the platform could generate it? Or is it my fault for inputting an unethical prompt?

Is it possible to stop AI from stealing from creatives?

If we agree that using copyrighted content is unethical, how do we police it? There are more than a few technology companies focused on this problem.

Originality.ai and Checker AI both offer tech that claims to detect AI-generated text. I tested both platforms, and they were correct in my attempts to fool them. Researchers say humans can train themselves to see AI content by recognizing style and repetitive language inconsistencies.

However, the MIT Review published an article on research showing that it is relatively easy for LLMs to produce content that gets by the AI checkers. I don't know where the courts will come down on this issue, but based on the Google vs. Viacom outcome in 2008, it is more likely that OpenAI will win or win in some format.

The more important question is: Can the technology be developed using AI against itself? If a watertight solution is created that gives us 100% certainty of authorship, it will be much easier to wrestle with copyright infringement. Otherwise, it truly is David against a gargantuan Goliath.

What do you think? Is Generative AI ethical? Leave a comment.