Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

some_guy@lemmy.sdf.org · 2 months ago

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

GamingChairModel@lemmy.world · edit-2 2 months ago

Yeah, I’m not a fan of AI but I’m generally of the view that anything posted on the internet, visible without a login, is fair game for indexing a search engine, snapshotting a backup (like the internet archive’s Wayback Machine), or running user extensions on (including ad blockers). Is training an AI model all that different?

sugar_in_your_tea@sh.itjust.works · 2 months ago

Yes, it kind of is. A search engine just looks for keywords and links, and that’s all it retains after crawling a site. It’s not producing any derivative works, it’s merely looking up an index of keywords to find matches.

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it’s based on and how much of those works it uses. So it’s complicated, but there’s very much a copyright argument there.

TheRealKuni@lemmy.world · 2 months ago

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues.

Derivative works are not copyright infringement. If LLMs are spitting out exact copies, or near-enough-to-exact copies, that’s one thing. But as you said, the whole point is to generate derivative works.

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

Microsoft’s AI boss thinks it’s perfectly okay to steal content if it’s on the open web