Who owns the code Claude wrote?

TL; DR

Agentic coding tools such as Claude Code enable humans to direct agents in multi-step planning and execution on complex coding tasks. Copyright generally does not protect purely AI-generated material. Copyright can protect original human-created expression within a work even if the work also includes some AI-generated material. The key is documenting creative human contributions throughout the software development lifecycle.

If you are using Claude Code, Codex, Cursor, etc. and you do not know whether the code you are shipping is copyrightable, whether you own it, or whether your employer owns it, this one is for you.
 
Thanks, good stuff. "who owns code" has always been fun; you develop specific IP that can be patented for a work problem but you did it outside of work on your own resources. Who gets the patent, you or the company?

Toss in AI that you may collaborate with outside of work on a work problem.
 
Who gets the patent, you or the company?
It depends. At least in the Netherlands. If you were tasked with writing the code, then it doesn't matter if you wrote that code in your spare time or not, it's still owned by the company.
 
It depends. At least in the Netherlands. If you were tasked with writing the code, then it doesn't matter if you wrote that code in your spare time or not, it's still owned by the company.
That's generally the normal thing for permie/contract work in the UK too, although some contracts allow you to escape the 'they own what you dreamed up in the shower' thing; if for example you're on a retainer with royalties contract (not very common nowadays) as opposed to a normal paid-hourly contract.
 
It depends. At least in the Netherlands. If you were tasked with writing the code, then it doesn't matter if you wrote that code in your spare time or not, it's still owned by the company.
Same in Germany.
While you are employed any creation (invention, anything can be patented, copyrighted or otherwise protected) you make, the employer owns it and has the privilege: she decides if she uses it for herself, sell it, allow you to use it, and also can prohibit its usage at all, neither by you, herself or anybody.
Doesn't matter if the creative process for the idea was made while work or in your spare time.
 
It is very common (at least here in Norway) for software developers that contribute to open source projects in their spare time to have an agreement about that - any work done outside of work and unrelated to work (in other words: not a competing project) is not included in the "employer owns code" part.
 
Another aspect of the original post that's very relevant for the *BSDs is that an LLM trained on GPL and other copyleft code can end up reproducing that code - even if not verbatim, close enough to be legally problematic.

OpenBSD's justification of not accepting AI code is, as far as I understand it (but IANAL), somewhat inaccurate from a legal PoV. For those who haven't seen it, read https://marc.info/?l=openbsd-tech&m=177425035627562

A lot of amateur commentary complains about code being "stolen" by AIs and Theo quite correctly says that's a misconception from the legal perspective. But when he says "Noone can put a Copyright onto that output, because under the current laws that requires a human to have created the output" (which is the reason he bans AI-generated code in OpenBSD) I think he makes a mistake himself. Human creative input can make the difference here, just like it can with other tools. Which is why good practice in AI use includes keeping a transcript of prompts. For comparison: you don't have to produce a portrait by painting it with your own hand to obtain copyright. Taking a portrait photo is also copyrightable even though it's a machine (the camera) which "creates" the image. Yet using a camera or image scanner to make a faithful copy of some 2D public domain artwork is not eligible for copyright protection, despite the technical skill and effort involved in doing so. The point is not the involvement of the machine, but the lack of originality and creative input.

Funnily enough, the thing Theo says is "not the real problem" is the regurgitation. But if the regurgitated code has a contaminated licence then that is a real problem, particularly for *BSDs. Part of the review process for the suitability of new code for inclusion is going to have to be checking whether it has substantial similarities to any impermissibly licensed code that could have been used as training data. (And I think it would be wise to treat all code contributions as potentially AI-assisted even if contributors are willing to sign off claiming otherwise - the tooling is becoming so widespread that I think blanket anti-AI rules are simply going to be ignored by at least some contributors, and projects shouldn't rely on self-declarations to the contrary.)
 
"A lot of amateur commentary complains about code being "stolen" by AIs"

It is "stolen" by AI. The programs used do not include license or attribution when sourcing other programs which is a breach of most licenses.

Using a program to breach a licensed work does not make it acceptable.

AI is used as a tool to steal obviously.

AI is alright when you use your own data but not when you are doing unethical things with other people's work by not following the license they have chosen.
 
It is very common (at least here in Norway) for software developers that contribute to open source projects in their spare time to have an agreement about that
Things get a lot more dicey when you're not contracted to write code (not a programming/coding job) but still end up writing code anyway (devops, puppet, ansible, shell scripts, custom tools, etc.)
 
"A lot of amateur commentary complains about code being "stolen" by AIs"

It is "stolen" by AI. The programs used do not include license or attribution when sourcing other programs which is a breach of most licenses.

Using a program to breach a licensed work does not make it acceptable.

AI is used as a tool to steal obviously.
As you say, it could (and often will) constitute a licensing breach. But it's not, in legal terms, "stealing". That's a completely different area of law. I understand why people use the word "stealing" here in a figurative and rhetorical manner. I'm sure Theo de Raadt takes a very dim view of much of AI firms' behavior even if he points out that it is not, in fact, "stealing". I didn't interpret it as him defending them in the slightest.

But when people want to discuss the legal position and an appropriate policy response, which is the point of the legal consultant's article in the original post and what de Raadt was doing when he set out the OpenBSD project's position, then such sloppy misuse of terminology is unhelpful. Which is why de Raadt rightly criticized it: law is one of those fields where "words have meanings" and there can be severe costs to getting those meanings wrong. Twisting meanings for polemical purposes might have its time-honored place on the campaign trail but doesn't belong in any serious, professional policy analysis where precise language is vital. Understanding legal risks is a huge part of the decisions organisations face on the use of AI, and they need to get the law right.

I'm not saying "don't say it". You've every right to view it as morally stealing, even if that's an incorrect analysis legally. Just be aware that the people with decision-making power or professional training in the field see a red flag from statements like "AI steals" or more generally "breach of license is theft". It instantly tells them that at best they're dealing with an amateur whose points they do not need to seriously engage with, and at worst it's someone deliberately spreading misinformation. The legal error here is similar to why the old scare tactic of "watching pirated videos is stealing" was also nonsense.
 
It is "stolen" by AI. The programs used do not include license or attribution when sourcing other programs which is a breach of most licenses.
Not so fast, young man.

Let me give you an example that is not from code, but from my area of expertise. Say I want to write a new physics textbook. I could do that by copying whole paragraphs, pages and diagrams from the standard textbooks (Resnick-Halliday, Landau-Lifshitz, Goldstein, Jackson ...). That would be violation of copyright, what you call "stealing". I can instead read all the standard textbooks in this field (I have several meters of them in my bookshelf), understand them, and write the same ideas using my own words. The facts won't change, and even the formulas will be the same: E=mc2 is the same, whether I learn it from two dozen books, or whether I photocopy it from one book.

And here lies the problem with applying traditional copyright law to LLMs. They don't take whole sections of code or whole pages from any one source. They read lots and lots of things (code), and then synthesize from it, at the level of sequences of short tokens, and probabilities for these sequences to follow each other. So if you look at their output, it is impossible to tell which of their myriad of sources that one line or paragraph of code was taken from.

Here is an interesting thought experiment: Take a blank (virgin) LLM. Feed it only GPL'ed code. Ask it to write a new piece of code, say to calculate the next 10 digits of pi. Will the resulting output be under GPL? Yes, because it must logically have come from only GPL'ed code. No, because it didn't come from any one piece of code under the GPL. So which one is it, yes or no?
 
ralphbsz good stuff, very interesting POV. I know what I don't know and I don't know enough on this but what if legally, it was recognized that everything AI/LLM generated was automatically Public Domain, no copyright, no ownership.
Simplistically, to me, that is an answer. "The Answer" probably not, but basically everything generated by AI/LLM is free/open/whatever.
 
... but what if legally, it was recognized that ...
But it hasn't. To my knowledge, there has been no law or court decision that speaks on how the copyright of the output of an LLM relates to its input. Matter-of-fact, I don't even think it's clear who holds the copyright of the output of an LLM (claims notwithstanding, anyone can make claims or write "Copyright XXX" into the code or checkin message).

We can speculate till we're blue in the face, but I have better things to do with my time.
 
  • Like
Reactions: mer
ralphbsz good stuff, very interesting POV. I know what I don't know and I don't know enough on this but what if legally, it was recognized that everything AI/LLM generated was automatically Public Domain, no copyright, no ownership.
Simplistically, to me, that is an answer. "The Answer" probably not, but basically everything generated by AI/LLM is free/open/whatever.
At the moment, AI firms generally state that they do not stake any claim of their own on the output of their models. I wouldn't bet against future legislation to firm this position up* and ensure AI firms don't change their mind later, producing an enormous legal mess given all the works that AI-created or assisted content ends up in.

* (Note that in some jurisdictions, it's not possible to give up all claims over creative output even voluntarily - e.g. author's rights are inalienable in Germany so you can't simply declare your own output is public domain there.)

As ralphbsz says, the issues are subtle and I recommend reading some serious analyses on the subject like the one in the original post. Unfortunately LLMs are sometimes able to reproduce non-trivial chunks of their training data, near enough verbatim. This is why your suggested solution isn't realistic. A rule of "I can use this how I like without worrying about anyone else's claim of ownership, because it popped out in my LLM output" would be very dangerous territory, especially for output that closely resembles original training data.
 
Not so fast, young man.

Let me give you an example that is not from code, but from my area of expertise. Say I want to write a new physics textbook. I could do that by copying whole paragraphs, pages and diagrams from the standard textbooks (Resnick-Halliday, Landau-Lifshitz, Goldstein, Jackson ...). That would be violation of copyright, what you call "stealing". I can instead read all the standard textbooks in this field (I have several meters of them in my bookshelf), understand them, and write the same ideas using my own words. The facts won't change, and even the formulas will be the same: E=mc2 is the same, whether I learn it from two dozen books, or whether I photocopy it from one book.

And here lies the problem with applying traditional copyright law to LLMs. They don't take whole sections of code or whole pages from any one source. They read lots and lots of things (code), and them synthesize from it, at the level of sequences of short tokens, and probabilities for these sequences to follow each other. So if you look at their output, it is impossible to tell which of their myriad of sources that one line or paragraph of code was taken from.

Here is an interesting thought experiment: Take a blank (virgin) LLM. Feed it only GPL'ed code. Ask it to write a new piece of code, say to calculate the next 10 digits of pi. Will the resulting output be under GPL? Yes, because it must logically have come from only GPL'ed code. No, because it didn't come from any one piece of code under the GPL. So which one is it, yes or no?
You're comparing two very different things. A person is not a program. A program that directly uses data for a model and then uses that data to generate output is simply stealing. The program has no analytical ability or comprehension ability like you. It's simply processing the works of people to generate something. This is called theft.
 
As you say, it could (and often will) constitute a licensing breach. But it's not, in legal terms, "stealing". That's a completely different area of law. I understand why people use the word "stealing" here in a figurative and rhetorical manner. I'm sure Theo de Raadt takes a very dim view of much of AI firms' behavior even if he points out that it is not, in fact, "stealing". I didn't interpret it as him defending them in the slightest.

But when people want to discuss the legal position and an appropriate policy response, which is the point of the legal consultant's article in the original post and what de Raadt was doing when he set out the OpenBSD project's position, then such sloppy misuse of terminology is unhelpful. Which is why de Raadt rightly criticized it: law is one of those fields where "words have meanings" and there can be severe costs to getting those meanings wrong. Twisting meanings for polemical purposes might have its time-honored place on the campaign trail but doesn't belong in any serious, professional policy analysis where precise language is vital. Understanding legal risks is a huge part of the decisions organisations face on the use of AI, and they need to get the law right.

I'm not saying "don't say it". You've every right to view it as morally stealing, even if that's an incorrect analysis legally. Just be aware that the people with decision-making power or professional training in the field see a red flag from statements like "AI steals" or more generally "breach of license is theft". It instantly tells them that at best they're dealing with an amateur whose points they do not need to seriously engage with, and at worst it's someone deliberately spreading misinformation. The legal error here is similar to why the old scare tactic of "watching pirated videos is stealing" was also nonsense.
It is direct theft. There is no comparison to piracy that's not even the slightly the same. You can use as many words to refute that as you like but it doesn't change the fact that "AI" is a theft bot.
 
It is direct theft. There is no comparison to piracy that's not even the slightly the same. You can use as many words to refute that as you like but it doesn't change the fact that "AI" is a theft bot.
"Theft" and "stealing" are legal terms. And in law, this is neither theft nor stealing. You can argue it should be ... but it isn't. I wouldn't normally push the point, but you're replying to a thread that literally concerns the legal status of the code produced by Claude. There are huge uncertainties that need to be navigated about this legal situation, particularly when it comes to intellectual property and licensing. Experts are being paid big money to provide whatever clarity is possible, though ultimately it's going to need some court cases and probably some legislation before it's really clear what's going on. But one thing that absolutely everybody with any relevant expertise agrees is that there's no way it is "stealing".

I made the comparison to piracy because that's another situation where infringement on intellectual property was frequently colloquially referred to as "theft" despite this being legally nonsensical. See for example https://en.wikipedia.org/wiki/Piracy_is_theft and https://en.wikipedia.org/wiki/Dowling_v._United_States_(1985)

The phonorecords in question were not "stolen, converted or taken by fraud" for purposes of [section] 2314. The section's language clearly contemplates a physical identity between the items unlawfully obtained and those eventually transported, and hence some prior physical taking of the subject goods. Since the statutorily defined property rights of a copyright holder have a character distinct from the possessory interest of the owner of simple "goods, wares, [or] merchandise," interference with copyright does not easily equate with theft, conversion, or fraud. The infringer of a copyright does not assume physical control over the copyright nor wholly deprive its owner of its use. Infringement implicates a more complex set of property interests than does run-of-the-mill theft, conversion, or fraud.
 
"Theft" and "stealing" are legal terms. And in law, this is neither theft nor stealing. You can argue it should be ... but it isn't. I wouldn't normally push the point, but you're replying to a thread that literally concerns the legal status of the code produced by Claude. There are huge uncertainties that need to be navigated about this legal situation, particularly when it comes to intellectual property and licensing. Experts are being paid big money to provide whatever clarity is possible, though ultimately it's going to need some court cases and probably some legislation before it's really clear what's going on. But one thing that absolutely everybody with any relevant expertise agrees is that there's no way it is "stealing".

I made the comparison to piracy because that's another situation where infringement on intellectual property was frequently colloquially referred to as "theft" despite this being legally nonsensical. See for example https://en.wikipedia.org/wiki/Piracy_is_theft and https://en.wikipedia.org/wiki/Dowling_v._United_States_(1985)
That's great that everybody agrees. It's is stealing and it's unethical as well. So I think I will agree to disagree with the "absolutely everybody" that you are posting on behalf of. If you could pass the message on to absolutely everybody that there is a singular person who disagrees with absolutely everybody I would appreciate it.

Edit: Perhaps this thread should be closed the great and powerful "absolutely everybody" has spoken on the matter. Solved?
 
But you know that day will come when Big Company A will challenge someone in court as a test case.
  1. They haven't tried yet.
  2. It might be a really dumb idea to try, since it might destroy their business model. If the output of an LLM is copyrighted by the AI company, most people can not use that output. Now, one could object that they don't have a business model to begin with, and the AI bubble will implode any moment.
  3. And even if they try, it's not clear what the courts (and/or legislators) will actually do. That depends more on the current political climate at the time.
So let's not speculate, and create doom scenarios. It might be a better idea to (a) educate our legislators, perhaps by educating the public, and (b) work towards the AI bubble popping sooner rather than later.
 
That's great that everybody agrees. It's is stealing and it's unethical as well. So I think I will agree to disagree with the "absolutely everybody" that you are posting on behalf of. If you could pass the message on to absolutely everybody that there is a singular person who disagrees with absolutely everybody I would appreciate it.
Like I said, lots of people make the mistake of mixing up the law on theft and the law on intellectual property infringement. It's very common, and some people even conflate them deliberately to make the latter sound as scary or clearcut as the former. It's a useful rhetorical device, as the "piracy is theft" campaign showed - and more recently with content creators complaining that AI firms have "stolen" their property.

I only said "absolutely everybody with any relevant expertise agrees". Say, somebody who was familiar with Dowling v. United States or their jurisdiction's equivalent. The area of law that experts are worried about here is not the law of theft or conversion, and potentially whether AI firms and/or their end users could be prosecuted or sued for stealing, but the law surrounding intellectual property. To anybody who does have expertise, people who insist that the legal situation is "theft" immediately look unserious. If it's important to you that decision-makers pay attention to your views, then you'd be better to read up on the law and stop writing nonsense, as that would make it much harder for them to dismiss what you're saying. There are very powerful arguments that can be made against AI which don't completely mangle the legal situation. If you just want to vent your views and don't mind whether people take it seriously then feel free - but don't expect to go uncorrected if you write legal misinformation.

I don't think the point of Dowling v. United States is all that hard to understand and it is an interesting area of law that impinges on a lot of people's professional lives, so personally I think it's worth a little reading up. But YMMV.
 
A person is not a program. A program that ...
A program is an extension of a person. Clearly, I can not sue a program in court. But I can sue the person who wrote the program, or who sold it, or who uses it. But in all cases, the basic causal relationship of "what made the crime happen" is the program.

The program has no analytical ability or comprehension ability like you. It's simply processing the works of people to generate something. This is called theft.
Nonsense. Example: I write a program that analyzes the pressure in my home water system, and how often the pumps have to run, with the outside temperature, the water level in the well, and the phase of Aquarius. I will find certain results (irrigation uses a lot of water, it takes longer to pump water further uphill). I control the input data, which I have carefully collected over the last few years. To calculate the positions of Jupiter and Aquarius, I use well-known public domain formulas from an astronomy textbook.

Where exactly is the theft here? So your assertion that all programs that process data commit theft is balderdash.

Next: The works of Shakespeare are in the public domain (he's been dead for a while). I use an LLM to write the perfect love sonnet, and use it in an attempt to seduce my wife into cooking a romantic dinner (we've been married for over 30 years). Am I stealing from Shakespeare? Note that my wife would probably slap me, and if I tried it three times in a row, she'd even use a heavy and large object to hit me. Anyway, other than the very serious crimes of literary stupidity and domestic violence, where is the theft here?

People publish things all the time. Shakespeare published his works. Einstein published E=mc2. Linus T publishes the Linux kernels. They all have some copyright. But I'm allowed to read all of these published works, and make myself a smarter human, and then use that smartness for my own purposes (like getting a nice dinner). We have copyright laws to make sure the balance of equity between the people writing and publishing things, and the people reading and using things remains fair and equitable. Those laws work (pretty badly), but they are the best we have, and they keep things somewhat in order.

LLMs don't change the concept behind that. They read all these works that are published and sometimes copyrighted, and they "make themselves smarter". The difference is that the economics of the system is completely changed: LLMs learn faster and broader, and create results more cheaply (alas also usually unreliably). For that reason, the existing "social contract" between author and user may not be appropriate any longer. But right now, we don't have a new set of laws to govern this, and using the existing ones in the new situation is difficult and may give counter-intuitive results.
 
Back
Top