Who owns the code Claude wrote?

Oh ick!

A random thought. GitHub uses your stuff to train Ai. Let's say I opt-out of the Ai stuff but someone who didn't opt-out forks my repo, does that mean Ai still digests/uses my stuff? ...but as I type that, it makes more sense that GitHub would say "no matter what you say we will use your input to train our Ai." so it's a moot point actually. But it's still such a tangled web that hurts my little brain.
 
You could therefore argue that Antropic is stealing code, not the AI itself. But Ralph explained why relying on general body of available knowledge isn't theft. I believe the same is true for code. And again, anything created by AI can be reproduced without it. AI is merely a tool that facilitates and shortens path to the goal. Just my 2 cents.
The issue is that nobody is going through and tracking where the stuff that isn't general is coming from. It's one thing to take the typical way of opening and writing to a file from an AI, there's likely nothing in that which is at all creative anyways at this point. The problems come when you have things being done in non-standard ways or rise to a level entitling protection and working out when you've crossed into stealing actually copyrightable material.
 
I am not a lawyer, but I believe the definition of stealing or theft is something like "taking something away from someone with the intention of depriving them of it for ever." In other words, it isn't defined by getting something for nothing, but by depriving the victim of it. That is why copying something without permission is not theft. It infringes the victim's right to control who benefits from the effort but it does not take anything away the victim originally had. It just deprives the victim of a reasonable expectation of compensation for work, which is a slightly different thing. It doesn't mean unauthorised copying isn't wrong, just that it's not theft because nothing is removed. It still causes loss of a sort to the victim for which the victim is entitled to compensation, but it's not theft. It is similar to not paying the fare on a bus or train. The person running the bus or train has not lost anything other than the expectation of a return on their investment in running the service. On the other hand, jumping into the driving seat and taking the bus away would be theft.
 
I am not a lawyer, but I believe the definition of stealing or theft is something like "taking something away from someone with the intention of depriving them of it for ever." In other words, it isn't defined by getting something for nothing, but by depriving the victim of it. That is why copying something without permission is not theft. It infringes the victim's right to control who benefits from the effort but it does not take anything away the victim originally had. It just deprives the victim of a reasonable expectation of compensation for work, which is a slightly different thing. It doesn't mean unauthorised copying isn't wrong, just that it's not theft because nothing is removed. It still causes loss of a sort to the victim for which the victim is entitled to compensation, but it's not theft. It is similar to not paying the fare on a bus or train. The person running the bus or train has not lost anything other than the expectation of a return on their investment in running the service. On the other hand, jumping into the driving seat and taking the bus away would be theft.
Yeah, I see how that would be true with proprietary / closed source software.

I think there is a larger issue with lack of attribution and license information when it comes to free / open software. The idea of being able to freely share and examine code and contribute is lost when the code is scraped and reused without attribution. This does not encourage technical learning, collaboration, or development you when deprive people of author information and the ability to contribute or the license which entail the way in which the software is meant to be used and or distributed which can lead to issues down the road.

There is also the ethical component which is outlined in some circles as being the rights of computer users.
 
I am not a lawyer, but I believe the definition of stealing or theft is something like "taking something away from someone with the intention of depriving them of it for ever." In other words, it isn't defined by getting something for nothing, but by depriving the victim of it. That is why copying something without permission is not theft. It infringes the victim's right to control who benefits from the effort but it does not take anything away the victim originally had. It just deprives the victim of a reasonable expectation of compensation for work, which is a slightly different thing. It doesn't mean unauthorised copying isn't wrong, just that it's not theft because nothing is removed. It still causes loss of a sort to the victim for which the victim is entitled to compensation, but it's not theft. It is similar to not paying the fare on a bus or train. The person running the bus or train has not lost anything other than the expectation of a return on their investment in running the service. On the other hand, jumping into the driving seat and taking the bus away would be theft.
Technically yes, but the line gets to be a bit fuzzy with this sort of thing, recreating somebody else's work using an AI that trained on it has the effect of taking the benefit of ownership away.

But, whether it's infringement or theft, anybody using these professionally had better know where the line is between boilerplate stuff that is the way it is because it has to be and stuff that is covered by copyright.
 
Well yes, I think we all agree the discussion about theft v infringement is a legal technicality, but reducing the benefit of ownership doesn't take the ownership away. In the case of FOSS the benefit of ownership is minimal since copying is permitted. Failing to give credit when the licence requires it breaches a contract, but that is also only a technicality since the credit has no tangible value. However, that does demonstrate legal technicalities matter.

What we are dealing with are unscrupulous people who do not respect rules because they can get away with it. Their scrapers do not respect the robots.txt protocol which also causes massive headaches for website owners. In Internet/FOSS terms, these people are simply ignoring netiquette because they know most people can't afford to sue them and their own pockets are so deep it probably wouldn't matter if someone did. The real problem here is an outbreak of lawlessness and a direct challenge to the social rules which have been the basis of civilisation for the last 300 years by people rich enough not to care.

Frankly, I wouldn't trust any AI-generated code to do, and only do, what I want it to do. LLMs collate, assemble and regurgitate patterns of language which replicates the kind of language humans use to express concepts, but because they don't actually understand the concepts they are unlikely to be reliably accurate. Of course, human beings can introduce biases or spread misinformation, but generally human beings can check and reject information which proves to be false. LLMs can't. They have no concept of correct or incorrect when it comes to facts so they have no means of checking. They can go for a consensus but cannot assess the accuracy or relative trustworthiness of sources. That would need another level of analysis above the LLM. It will come in time, of course.

When it comes to code, it's either right or wrong for the intended purpose. If it's derived from the work of others it will only be right if it substantially matches the original. A program written by combining sections of code from others is hardly likely to be meaningful unless there's a rationale behind combining them, and an LLM is more likely to produce something which looks like a program but achieves nothing useful than something genuinely novel which solves a real problem. To be useful at code generation the AI would need to analyse the problem and generate code to implement an algorithm to solve it. That code would then be original rather than simply derivative as the training data would only have been used to teach the machine how to code rather than being directly used to generate output. Presumably it would then be owned by the owners of the AI model or their customer depending on the contract terms under which the service was used. But given the way they treat law, that might be pretty useless.
 
Oh ick!

A random thought. GitHub uses your stuff to train Ai. Let's say I opt-out of the Ai stuff ...
As you said. It's super easy: you published your code. Which means anyone can read it. Including AIs. People can't copy it or use it without following whatever license is on it, but anyone can read it. That's the whole point of publishing your code: you want people to be able to read it.

If you don't want your code to be read (for example by an AI), then don't publish it.
 
As you said. It's super easy: you published your code. Which means anyone can read it. Including AIs. People can't copy it or use it without following whatever license is on it, but anyone can read it. That's the whole point of publishing your code: you want people to be able to read it.

If you don't want your code to be read (for example by an AI), then don't publish it.
I doubt there are a large number of developers publishing code for the purpose of "AI" being able to read it. If you don't want your code to be read by "AI" there are likely technical solutions to mitigate the virus known as "AI" from reading the code. Or there is always the old school way of sending programs via the post. :)

Edit: It shouldn't be called artificial intelligence. It should be called mechanized plagiarism.
 
May be of interest:
I've been thinking about this and I've added (am adding) the following to my readme.

I know I'm not going to win a fight against people using Ai but hopefully I can limit the impact on me. After speaking with at least one professional developer and hearing how "difficult" it can be dealing with Ai in their job, I wanted a simple "guard" in place at least. I get that the below isn't anything binding but at least I'm on the record as saying I don't want large chunks of Ai regurgitated code. Comments/Suggestions are, of course, welcome.

I should add:
My other thought on this topic was maintainability because I know in a project it must be difficult for developers to accept a large refactor because it would be them that has to maintain the code. And me not being a professional, I don't think I'd have the ability to maintain large code contributions (Ai make it easy to generate, not maintain in my opinion).

Code:
### Use of AI
Code produced with AI assistance may be accepted only if it is:
- Trivial and not copyrightable (e.g., single-line fixes, basic refactors), or
- Accompanied by a public statement from the AI provider that they do
  not assert copyright over the generated work.

AI-assisted code is often harder to maintain and for that reason,
human-authored patches for larger changes is preferred. Large-scope
refactors or additions that appear to be AI-generated can be closed
without explanation.
 
I doubt there are a large number of developers publishing code for the purpose of "AI" being able to read it.
The purpose of doing something usual doesn't matter much to the result.

If you don't want your code to be read by "AI" there are likely technical solutions to mitigate the virus known as "AI" from reading the code.
There are technical solutions that make it more difficult for AIs. For example, I have on my to-do list to spent a few hours on my robots.txt and .htaccess files to reduce the amount of bandwidth used wasted by AI web scrapers on my personal web site. Not because I hate AI (I really do!), but because I pay for bandwidth, and to me there is no purpose of AIs reading my web site.

But if you publish something with the intent that humans can read, eventually with enough effort an AI will be able to read it. For example by hiring lots of humans who look at my pages in GitHub, read out the code aloud, and a microphone with voice recognition that transcribes it for the AI.

Or there is always the old school way of sending programs via the post. :)
Which is sort of what I do. My code is on my house internal network, in a source control system (not git but hg, same concept). I sometimes e-mail it to myself. And in some cases to friends.

Edit: It shouldn't be called artificial intelligence. It should be called mechanized plagiarism.
No, it is not always plagiarism. An AI that reads the Encyclopedia Britannica is no better and no worse than a human who reads it. My problem with it is encapsulated in the insult I use on it: Artificial Stupidity.
 
I have on my to-do list to spent a few hours on my robots.txt and .htaccess files to reduce the amount of bandwidth used wasted by AI web scrapers on my personal web site.
I've always found and read that robots.txt doesn't stop anything cause few systems and surfers care to even look at it.
 
Back
Top