![](/static/253f0d9b/assets/icons/icon-96x96.png)
![](https://fry.gs/pictrs/image/c6832070-8625-4688-b9e5-5d519541e092.png)
So you are saying that content scraped before the law is fair game to train new models? If so it’s fucking terrible. But again, I doubt this is the case since this would be against the interests of the big copyright holders. And if it’s not the case you are just creating a storm in glass of water since this affects the companies too.
As a side point, I’m really curious about LLM uses. As a programmer the only useful product I have seen so far is copilot and similar tools. And I ended up disabling the fucking thing because it produces too much garbage hahaha. But I’m the first to admit I haven’t been following this hype cycle hahahaha, so I’m really curious what the big things will be. You clearly know so much, so want to enligten me?
This is a very weird assumption you are making man. The quoted text you sent above pretty much says the opposite. It says everyone who wants to train their models wirh copyrigthed data needs to get permission from the copyright holders. That is great for me period. No one, not a big company nor the open source community, gets to steal the work of people producing art, code, etc. I honestly don’t get why you assume all the data scrapped before would be exempt. Again, very weird assumption.
As for ML algorithms having use, of course they have. Hell, pretty much every company I have worked with has used them for decades. But take a look at the examples you provided. None of them requires you or your company scrapping a bunch of information from randoms on the internet. Specially not copyrighted art, literature, or code. And that’s the point here, you are acting like all of that stops with these laws but that’s ridiculous.