Who owns the code Claude wrote?

Oh ick!

A random thought. GitHub uses your stuff to train Ai. Let's say I opt-out of the Ai stuff but someone who didn't opt-out forks my repo, does that mean Ai still digests/uses my stuff? ...but as I type that, it makes more sense that GitHub would say "no matter what you say we will use your input to train our Ai." so it's a moot point actually. But it's still such a tangled web that hurts my little brain.
 
You could therefore argue that Antropic is stealing code, not the AI itself. But Ralph explained why relying on general body of available knowledge isn't theft. I believe the same is true for code. And again, anything created by AI can be reproduced without it. AI is merely a tool that facilitates and shortens path to the goal. Just my 2 cents.
The issue is that nobody is going through and tracking where the stuff that isn't general is coming from. It's one thing to take the typical way of opening and writing to a file from an AI, there's likely nothing in that which is at all creative anyways at this point. The problems come when you have things being done in non-standard ways or rise to a level entitling protection and working out when you've crossed into stealing actually copyrightable material.
 
I am not a lawyer, but I believe the definition of stealing or theft is something like "taking something away from someone with the intention of depriving them of it for ever." In other words, it isn't defined by getting something for nothing, but by depriving the victim of it. That is why copying something without permission is not theft. It infringes the victim's right to control who benefits from the effort but it does not take anything away the victim originally had. It just deprives the victim of a reasonable expectation of compensation for work, which is a slightly different thing. It doesn't mean unauthorised copying isn't wrong, just that it's not theft because nothing is removed. It still causes loss of a sort to the victim for which the victim is entitled to compensation, but it's not theft. It is similar to not paying the fare on a bus or train. The person running the bus or train has not lost anything other than the expectation of a return on their investment in running the service. On the other hand, jumping into the driving seat and taking the bus away would be theft.
 
I am not a lawyer, but I believe the definition of stealing or theft is something like "taking something away from someone with the intention of depriving them of it for ever." In other words, it isn't defined by getting something for nothing, but by depriving the victim of it. That is why copying something without permission is not theft. It infringes the victim's right to control who benefits from the effort but it does not take anything away the victim originally had. It just deprives the victim of a reasonable expectation of compensation for work, which is a slightly different thing. It doesn't mean unauthorised copying isn't wrong, just that it's not theft because nothing is removed. It still causes loss of a sort to the victim for which the victim is entitled to compensation, but it's not theft. It is similar to not paying the fare on a bus or train. The person running the bus or train has not lost anything other than the expectation of a return on their investment in running the service. On the other hand, jumping into the driving seat and taking the bus away would be theft.
Yeah, I see how that would be true with proprietary / closed source software.

I think there is a larger issue with lack of attribution and license information when it comes to free / open software. The idea of being able to freely share and examine code and contribute is lost when the code is scraped and reused without attribution. This does not encourage technical learning, collaboration, or development you when deprive people of author information and the ability to contribute or the license which entail the way in which the software is meant to be used and or distributed which can lead to issues down the road.

There is also the ethical component which is outlined in some circles as being the rights of computer users.
 
I am not a lawyer, but I believe the definition of stealing or theft is something like "taking something away from someone with the intention of depriving them of it for ever." In other words, it isn't defined by getting something for nothing, but by depriving the victim of it. That is why copying something without permission is not theft. It infringes the victim's right to control who benefits from the effort but it does not take anything away the victim originally had. It just deprives the victim of a reasonable expectation of compensation for work, which is a slightly different thing. It doesn't mean unauthorised copying isn't wrong, just that it's not theft because nothing is removed. It still causes loss of a sort to the victim for which the victim is entitled to compensation, but it's not theft. It is similar to not paying the fare on a bus or train. The person running the bus or train has not lost anything other than the expectation of a return on their investment in running the service. On the other hand, jumping into the driving seat and taking the bus away would be theft.
Technically yes, but the line gets to be a bit fuzzy with this sort of thing, recreating somebody else's work using an AI that trained on it has the effect of taking the benefit of ownership away.

But, whether it's infringement or theft, anybody using these professionally had better know where the line is between boilerplate stuff that is the way it is because it has to be and stuff that is covered by copyright.
 
Well yes, I think we all agree the discussion about theft v infringement is a legal technicality, but reducing the benefit of ownership doesn't take the ownership away. In the case of FOSS the benefit of ownership is minimal since copying is permitted. Failing to give credit when the licence requires it breaches a contract, but that is also only a technicality since the credit has no tangible value. However, that does demonstrate legal technicalities matter.

What we are dealing with are unscrupulous people who do not respect rules because they can get away with it. Their scrapers do not respect the robots.txt protocol which also causes massive headaches for website owners. In Internet/FOSS terms, these people are simply ignoring netiquette because they know most people can't afford to sue them and their own pockets are so deep it probably wouldn't matter if someone did. The real problem here is an outbreak of lawlessness and a direct challenge to the social rules which have been the basis of civilisation for the last 300 years by people rich enough not to care.

Frankly, I wouldn't trust any AI-generated code to do, and only do, what I want it to do. LLMs collate, assemble and regurgitate patterns of language which replicates the kind of language humans use to express concepts, but because they don't actually understand the concepts they are unlikely to be reliably accurate. Of course, human beings can introduce biases or spread misinformation, but generally human beings can check and reject information which proves to be false. LLMs can't. They have no concept of correct or incorrect when it comes to facts so they have no means of checking. They can go for a consensus but cannot assess the accuracy or relative trustworthiness of sources. That would need another level of analysis above the LLM. It will come in time, of course.

When it comes to code, it's either right or wrong for the intended purpose. If it's derived from the work of others it will only be right if it substantially matches the original. A program written by combining sections of code from others is hardly likely to be meaningful unless there's a rationale behind combining them, and an LLM is more likely to produce something which looks like a program but achieves nothing useful than something genuinely novel which solves a real problem. To be useful at code generation the AI would need to analyse the problem and generate code to implement an algorithm to solve it. That code would then be original rather than simply derivative as the training data would only have been used to teach the machine how to code rather than being directly used to generate output. Presumably it would then be owned by the owners of the AI model or their customer depending on the contract terms under which the service was used. But given the way they treat law, that might be pretty useless.
 
Oh ick!

A random thought. GitHub uses your stuff to train Ai. Let's say I opt-out of the Ai stuff ...
As you said. It's super easy: you published your code. Which means anyone can read it. Including AIs. People can't copy it or use it without following whatever license is on it, but anyone can read it. That's the whole point of publishing your code: you want people to be able to read it.

If you don't want your code to be read (for example by an AI), then don't publish it.
 
Back
Top