Authors Are Furious After Finding Their Works on List of Books Used To Train AI

stopthatgirl7@kbin.social · 1 year ago

Authors Are Furious After Finding Their Works on List of Books Used To Train AI

RalphWolf@lemmy.ca · 1 year ago

Does this fall under fair-use part of copyright?

FaceDeer@kbin.social · 1 year ago

It hasn’t been tested in court yet but I don’t see why it shouldn’t.

just another dev@lemmy.my-box.dev · 1 year ago

Fair use is any copying of copyrighted material done for a limited and “transformative” purpose, such as to comment upon, criticize, or parody a copyrighted work.

I don’t see why it should.

FaceDeer@kbin.social · 1 year ago

The creation of the AI model is transformative. The AI’s model does not contain a literal copy of the copyrighted work.

just another dev@lemmy.my-box.dev · 1 year ago

No, but the training data does contain a copy. And making a model is not criticising, commenting upon, or creating a parody of it.

FaceDeer@kbin.social · 1 year ago

That list is not exclusive, it’s just a list of examples of fair use.

The training data is not distributed with the AI model.

just another dev@lemmy.my-box.dev · edit-2 1 year ago

it’s just a list of examples of fair use.

Yes, it’s a list of quite similar ways of commenting upon a work. Please explain how training an LLM is like any of those things, and thus, how Fair use would apply.

FaceDeer@kbin.social · 1 year ago

I’m not saying that training an LLM is like any of those things. I’m saying it doesn’t have to be like those things in order for it to still be fair use.

FontMasterFlex@lemmy.world · 1 year ago

Pay for every bit of information you’ve read and regurgitated on exams.

BURN@lemmy.world · 1 year ago

AI is not human and should not be treated like a human

FontMasterFlex@lemmy.world · 1 year ago

It’s not. The humans that trained it (assumably) purchased the material used to train it. What’s the problem?

BURN@lemmy.world · 1 year ago

The use of the material to create a commercial product as well as the reality being that the humans training it never buy the data on an individual level.

lloram239@feddit.de · edit-2 11 months ago

deleted by creator

kromem@lemmy.world · 1 year ago

The training argument is probably going to come up dry by the time the court works its way through expert testimony, as the underlying argument for training as infringement is insane.

But where OpenAI is probably in hot water is that torrenting 100k books in the first place runs afoul of existing copyright legislation.

Everyone is debating the training in these suits, but the real meat and potatoes is going to be the initial infringement of obtaining the books, not how they were subsequently used.