seawasp: (Default)
[personal profile] seawasp

... or as close to it as makes no difference. 


A number of people try to argue that the training done to make the current major LLM-based AIs didn't involve theft of intellectual property. This is an argument generally based on the fact that the AI doesn't retain an addressable, identifiable copy of the various training materials anywhere in its memory. The training is, to simplify it extremely, based on a mechanism in which one side of a network is presented with data that may include target features, and then the network's response is compared to an expected response and the weighting and perhaps constants associated with the network's elements are adjusted to bring the likely response more in line with the known goals. Repeated many times, with a sufficiently large number of inputs, proper backpropagation designs, and a lot of additional data for testing and comparison, this leads to a network that can match patterns and predict outputs to an impressive degree. "Many times" in the case of current AIs is a very, very, very large number. 

So the AI does not include the words, images, and such that it was trained on; just an immense network of neural network units connected in specific manners and layers with weights and possible other elements determined during training. 

But "I copied your book and sold it" is only ONE type of theft of intellectual property. The current AIs were all trained on hundreds of thousands of novels, most of which the companies KNEW were not theirs to use for commercial purposes, and internal documentation shows they did, in fact, know this, and made a deliberate decision to go ahead anyway and use, not even commercially purchased copies of said books, but an already illicit "pirate" archive of books. At the least, this shows that (A) the companies were aware that what they were doing was, at the least, ethically questionable, and (B) that they thought there was at least some chance that it was legally questionable because they did debate paying for the materials -- but decided not to for the sake of expenses.

The Anthropic settlement adds weight to this. No company decides to cut loose of 1.5 billion dollars if they think they have an airtight case for the defense. The settlement implies that Anthropic felt they had a good chance to lose MORE than that if it came to actual final trial and ruling. 

What the current AIs are doing is the larger equivalent of someone reading one of my books, and then building, say, an amusement park of "fictional locations", and within that having a "Zarrathan" section where the "Saurian King" rules "Zarrathanton" and the layout of "Zarrathanton" and its appearance are also visually and in textual description extremely close to my own, and wherein characters like "Kyrie Vantage" and "Tomimar Sylverrun" pursue adventures startlingly similar to those of Tobimar and Kyri in "Phoenix Rising" and sequels. 

This would be inarguably intellectual property theft. The minor variations and distortions do nothing to obscure the fact that while the precise text may not be present, that the concepts and characters and locations are being effectively copied without permission or authority. Try making a (straightforward, not parody) comic about Ricky Mouse and his girlfriend Rinnie Mouse and their friends and see how long it takes for Disney to inform you of the error of your ways. 

And this is what the current AIs do -- just extended over and mixed with many, many other sources. Nonetheless, the basis of the sources has been absorbed into the systems; this is why many of the image-focused AI systems can easily produce images that appear to be duplicates of well-known media properties, even though there are always small detectable differences.

This is an attempt to take authors' and artists' works and make money off of them without either crediting or paying or even asking for permission to do so. When it was PURELY FOR RESEARCH, that was a different situation, but when they went from "huh, can I make this work?" to "well, can our company make money from this" at that point it was incumbent upon them to either (A) junk the experimental model and build a new, commercial AI on firm legal and ethical grounds, or (B) sort through all the training materials and get ready to pay each and every one of the creators. 

Instead, they just took their research and turned it into a product, evading the responsibility for exactly how they'd built the thing and why.  

EVERY SINGLE RESPONSE THESE SYSTEMS MAKE IS BUILT UPON THAT IMMENSE MASS OF STOLEN DATA. They are making commercial use of a dozen or more of my books, and hundreds of thousands of other books, every second of every day, without permission or payment. 

And that is IP theft, no matter how they try to spin it. 

Date: 2026-06-01 07:15 pm (UTC)
kengr: (Default)
From: [personal profile] kengr
Yep, definitely "derivative work" in that without the original work, *their* work either wouldn't exist or would be notably different.

you said it

Date: 2026-06-03 05:58 am (UTC)
callibr8: icon courtesy of Wyld_Dandelyon (Default)
From: [personal profile] callibr8
Completely agree with both description and conclusion. The question is whether it's possible to get enough people to comprehend this, to move the needle. I'd like to think that's possible... but it's hard to hold out that much hope in the current hellscape perpetrated by the badministration.
Page generated Jun. 4th, 2026 03:28 pm
Powered by Dreamwidth Studios