Will FreeBSD adopt a No-AI policy or such?

I wouldn't say that would be the case. It depends on the perspective of the user.

But I think there is also a legitimate liability concern due to licensing. Projects heavily incorporating "AI" output into their programs may be sued into oblivion at some point. And then they are simply not an option. I would assume, given the history of both NetBSD and FreeBSD, this outcome would be one to avoid. But I can't speculate on the matter. Just have to wait for the core team to release their policy.
In the worst case scenario, the code belongs to the Public Domain. That's why we shouldn't just accept whatever is copy/pasted but heavily modify it after review.

LLM's are just another tool like compilers.
 
In the worst case scenario, the code belongs to the Public Domain. That's why we shouldn't just accept whatever is copy/pasted but heavily modify it after review.

LLM's are just another tool like compilers.
Worst case scenario is there is a judgement against "AI" companies for making verbatim copies of licensed projects implicating software projects using "AI" to write programs. And, as NetBSD has so wisely pointed out, that would mean that time would have to be spent removing "tainted" code from programs or face lawsuits.

To clarify, it wouldn't be an open source project winning in a lawsuit over copyright infringement. It would likely be a massive multi national corporation claiming ownership over a massive amount of open source projects code base.
 
Worst case scenario is there is a judgement against "AI" companies for making verbatim copies of licensed projects implicating software projects using "AI" to write programs. And, as NetBSD has so wisely pointed out, that would mean that time would have to be spent removing "tainted" code from programs or face lawsuits.

To clarify, it wouldn't be an open source project winning in a lawsuit over copyright infringement. It would likely be a massive multi national corporation claiming ownership over a massive amount of open source projects code base.
The chance of this happening is close to nil.

Why isn't NetBSD afraid of Oracle lawsuits over ZFS then?
 
The chance of this happening is close to nil.

Why isn't NetBSD afraid of Oracle lawsuits over ZFS then?
That's a good question for the NetBSD mailing list. I wouldn't have any insight on that matter.

But on the matter of the "AI" stuff, I would say it's likely to happen. The "AI" bots watermark their code. So it's not like it's public domain per se. It's clearly generated. The part I dislike is the fact that it's licensed work being used without conveying that to users. :/ But their is likely going to be a legal issue in the future.

On the likelihood of a massive legal issue I believe there is most certainly going to be many lawsuits filed. The majority of the current legal issues are on copyright with media it appears. But who knows. Perhaps you're right? But that still doesn't make the process ethical.
 
That's a good question for the NetBSD mailing list. I wouldn't have any insight on that matter.

But on the matter of the "AI" stuff, I would say it's likely to happen. The "AI" bots watermark their code.
This is not true.

So it's not like it's public domain per se.

I'm not a fan of Wikipedia but here are some pointers:

It's clearly generated. The part I dislike is the fact that it's licensed work being used without conveying that to users. :/ But their is likely going to be a legal issue in the future.

On the likelihood of a massive legal issue I believe there is most certainly going to be many lawsuits filed. The majority of the current legal issues are on copyright with media it appears. But who knows. Perhaps you're right? But that still doesn't make the process ethical.
They won't go after NetBSD. Lawsuits are all about money.
 
This is not true.



I'm not a fan of Wikipedia but here are some pointers:


They won't go after NetBSD. Lawsuits are all about money.
One clear (at least to me) fact is that if any GPL'ed material is included in any of "training data" of specific LLMs, the whole bunch of the LLMs and anything included in any of outputs "generated" SHALL covered (infected!) by GPL, even if other "proprietary", "patented", "top secret of governments/military forces or private info of presidents", "permissively licensed like BSD" or "public domain" works are also included. There CANNOT be any exceptions as of the nature of GPL.

This is why I object to use AI/LLM generated "codes" to be included, but NOT objecting to use AI/LLM limited for finding bugs.

Asking "why this is vulnerable?" would be OK, but asking "then, how can we fix it?" is NO-GO, unless the LLM is clearly stating and validated that no BSD-incompatibully licensed (or already in public domain state) material are used to train it.
 
One clear (at least to me) fact is that if any GPL'ed material is included in any of "training data" of specific LLMs, the whole bunch of the LLMs and anything included in any of outputs "generated" SHALL covered (infected!) by GPL, even if other "proprietary", "patented", "top secret of governments/military forces or private info of presidents", "permissively licensed like BSD" or "public domain" works are also included. There CANNOT be any exceptions as of the nature of GPL.

This is why I object to use AI/LLM generated "codes" to be included, but NOT objecting to use AI/LLM limited for finding bugs.

Asking "why this is vulnerable?" would be OK, but asking "then, how can we fix it?" is NO-GO, unless the LLM is clearly stating and validated that no BSD-incompatibully licensed (or already in public domain state) material are used to train it.
GPL code in training doesn't make any generated code GPL. Good luck to the FSF arguing that in court, though IANAL.

GPL infection is done via linking and libraries may contain the linking exception or use the LGPL.
 
Well, this is a good topic. The only thing we don't see is the policy from the core team.

"The suspense is terrible, I hope it'll last."
The wise words of Willy Wonka.
 
GPL code in training doesn't make any generated code GPL.
Even if some specific countries consider GPL cannot affect LLM even if any GPL'ed matter are used in training data, UN still does NOT released international law to clarify it.

Until this completely finished and ALL COUNTRIES UNDER UN TAKES IT INTO CORRESPONDING COUNTRY SPECIFIC LAWS, there are still unpredictable and tooooooooo huge risk to take.

Again, international law accepted by all countries is mandatory to be safe.
 
Anyone keeping up with claw-code? The Public Domain nonsense is just a way to avoid lawsuits for these companies. Eventually they'll come to collect. 😃
 
GPL code in training doesn't make any generated code GPL. Good luck to the FSF arguing that in court, though IANAL.

GPL infection is done via linking and libraries may contain the linking exception or use the LGPL.

if Claude was trained on the LGPL-licensed codebase and its output reproduces substantial verbatim portions of that code, can the output be treated as license-free? The chardet dispute did not resolve cleanly and no court has issued a definitive ruling on this specific question. What is settled is that verbatim copying of GPL code violates the license regardless of how it was produced. What is unsettled is whether AI-generated output that reproduces training data patterns counts as verbatim copying. The working assumption among lawyers advising companies through M&A is that it probably does, and that assumption is now showing up as a standard condition in acquisition due diligence.

The notion that LLM derived code should not be suspect to licensing hangs on a thin thread where AI is taken as such, as something intelligent, and then human 'laws' can apply to it - you saw the code, you understood it, you can reproduce it.

But LLM does not see, understand, reproduce, it's merely a statistics text autocompletion machine.
 
Due to nature of the beast the LLM cannot backtrack an output to sources of initial training data.

Consider this, in a classical tooling scenario. You have deterministic tools that strip out code blocks from LGPL software, rename the symbols and then inject it into your codebase. For the sake of example, the tool does not and cannot simply log these inputs, it cannot tell you, I've used libX and Y for this, so I'm copying their respective LICENSE files over.

What you'd have to do to stay legally clean, is include every license file of every LGPL project your tools have access to.
It's down to basic legal principle - if you use, but not include, you're in problem. If you don't use, but include, you're in no problem.

A LLM service can use LGPL/BSD/MIT training data in that way.
However, the authors of the used code will soon find out the LLM service is using them, as the emitted 'product' will have their license in. All of them.

Then, the author can modify the license to explicitly forbid using the code as LLM training data if he wishes, for next versions.

It's actually pretty clear why AI bros are muddying the waters all the time speaking about AGI, singularity, and all of that bullshit, they want to anthropomorphize software so it gets human rights, as those human rights will ensure they can continue infringing on intellectual property. They want to paint a false big picture, where one that regulates LLM affairs red-tapes the progress and then the competitors will reach this revolutionary, never seen before quantum-leap thing faster than them.

It's all bollocks. LLM is not human and all it takes are a few good lawsuits to firmly bracket it as just a software or a software service. The inners don't matter. It takes intellectual property in, and produces a result out of it - a derivative work.

May I remind you all, when we speak about cases such as music, where some laws say this many beats or this many notes need to be different, etc...that is just arbitrary mechanism on how to tackle the problem of presumption of innocence. The court, in many cases, needs to presume I have not heard the song someone claims I ripped off, because they cannot prove I did. But even if they cannot, they guard the proprietor if I stumbled on same idea as he did before me.

What the court does here is not 'you need to minimally derive a riff three times before we let you go', but say, if I cannot prove you heard it, if it's different in three aspects, I will let you go. The three aspects thing doesn't apply without the first point. So it's firmly out of the window in the case of LLM - it can derive the input to point of no recognition, the depth doesn't matter - it is derivative work.
 
No. We're all much better than that. Arguing is a petty-bourgeois vice.
I'm basing my assumptions on the blog post linked a few weeks before, "who owns the code claude wrote".
I believe the author mostly refers to court cases in US.

The way different jurisdictions tackle this could definitely be a problem.
 
It seems that some have strong views on this topic. I have strong speculation. 😄

I mean it's a problem of civil governments not having a bloody clue about "IT" although we've been firmly in computer age for 25 years.

In this time they haven't even defined what a software is. It is a product, is it a service, is a piece, like piece of art?

Biggest software vendors specifically prohibit reverse engineering. In areas such as USA, you can end up in prison for the rest of your life for data crimes. Seems to me like this data thing is very important so whoever owns it should be able to inspect what it goes thru right? Imagine you don't have right to disassemble a water pump used for your home's running water.

Guy buys 3D Studio boxed, physical product in cca 2007 for a lot of cash and puts in on ebay in 2012, Autodesk sends cease and desist to ebay, ebay pulls it down, some Euro consumer rights thing says that's wrong, ebay says ok but we have right of discretion so we don't have to allow it anyway.

This is where we are...nobody cared, nobody cares. All they care for is IT/ICT generates a lot of money for kickback and investments. The lack of any sort of direction or effort in lawmaking stems from that.
 
I mean it's a problem of civil governments not having a bloody clue about "IT" although we've been firmly in computer age for 25 years.

In this time they haven't even defined what a software is. It is a product, is it a service, is a piece, like piece of art?

Biggest software vendors specifically prohibit reverse engineering. In areas such as USA, you can end up in prison for the rest of your life for data crimes. Seems to me like this data thing is very important so whoever owns it should be able to inspect what it goes thru right? Imagine you don't have right to disassemble a water pump used for your home's running water.

Guy buys 3D Studio boxed, physical product in cca 2007 for a lot of cash and puts in on ebay in 2012, Autodesk sends cease and desist to ebay, ebay pulls it down, some Euro consumer rights thing says that's wrong, ebay says ok but we have right of discretion so we don't have to allow it anyway.

This is where we are...nobody cared, nobody cares. All they care for is IT/ICT generates a lot of money for kickback and investments. The lack of any sort of direction or effort in lawmaking stems from that.
Yes I agree completely. Legislative members are not typical computer scientist, engineers, or software/program devs.

This is a great rant thread on the topic. But let's just wait and see what the core team decide on the matter. They're all pretty informed intellectuals.

I guess it is customary for NetBSD to go first and FreeBSD to take their time and release things later though. 😂
 
I am currently installing NetBSD on a system in preparation for the possibility that the FreeBSD core team “AI” policy does not closely reflect section 2 of the NetBSD commit policy.

I figure I should learn NetBSD now so that moving all my systems over goes as smoothly as possible.

People submitting AI generated code will just not declare the code as such. But submit anyway. I'm with Torvalds on this one.
 
People submitting AI generated code will just not declare the code as such. But submit anyway. I'm with Torvalds on this one.
Eew haha <-- to Torvalds. The NetBSD Commit Guidelines do attempt to have accountability for submissions including a back trace on source origin and authors. So, that is always good. But man, hopefully people are not that malicious.
 
Even if some specific countries consider GPL cannot affect LLM even if any GPL'ed matter are used in training data, UN still does NOT released international law to clarify it.

Until this completely finished and ALL COUNTRIES UNDER UN TAKES IT INTO CORRESPONDING COUNTRY SPECIFIC LAWS, there are still unpredictable and tooooooooo huge risk to take.

Again, international law accepted by all countries is mandatory to be safe.
The UN has no such powers.
 
  • Like
Reactions: mer
People submitting AI generated code will just not declare the code as such. But submit anyway. I'm with Torvalds on this one.
In many ways I hope people don't declare the AI they used. It opens the code up for potential later legal issues if "people" do decide that GPL virality does extend to LLM training data.

The notion that LLM derived code should not be suspect to licensing hangs on a thin thread where AI is taken as such, as something intelligent, and then human 'laws' can apply to it - you saw the code, you understood it, you can reproduce it.

But LLM does not see, understand, reproduce, it's merely a statistics text autocompletion machine.
I think you describe the situation and the problem very well. In time, lawyers will have a good argument based on technical facts (the LLM algorithms themselves). The code an LLM "autocompletes" and generates from a license encumbered database of training weights can be license encumbered.

The precedent will one day be stated:
  • This is not like using a GPL licensed compiler like GCC to compile code. It is generating the code itself from a GPL incorporated lossy database
  • This is not like a human (brain) using their knowledge of GPL code to clean-room redevelop software. The algorithm is provably mechanical.
  • This is not like using splint/sanitization tools for checking code if it also generates the "fix" from an encumbered database.
 
But it's what really, really needed now! Without it, risks cannot be lowered.
Not limited with AI/LLM confusions, though. Quite unfortunate fact.
LLMs came here to stay and we have to learn to live with it. I was also skeptical at first but the technology has potential.

We have to find a middle ground between the AI optimism and the doomerism. The latter only contributes to AI exceptionalism, and thus to the hype. It all depends on the model, the data it was trained on, and the use we make of it. Just like with any other tech.
 
LLMs came here to stay and we have to learn to live with it. I was also skeptical at first but the technology has potential.

We have to find a middle ground between the AI optimism and the doomerism. The latter only contributes to AI exceptionalism, and thus to the hype. It all depends on the model, the data it was trained on, and the use we make of it. Just like with any other tech.

I would disagree, the idea of forcing an idea or a program onto others is not ethical. Implying that one must or has to accept something is also typically "doomerism" inspiring.

Artificial Intelligence research and applications have been around for many decades represented in both hardware and software. The unregulated abuse of the open web, I believe, should not be tolerated. Technology should not be programmed to disregard the rights of the users, the license of different works, or the ability to choose not to participate. The alternative is force, and force always has resisitance.

Deciding what "we" do with it implies that "we" have a right to use the training data in these programs. And that is very questionable if not outright an abuse of the open internet.

EDIT: What you said about accepting LLMs for some reason made me think of this epic tune. :D

View: https://youtu.be/1ldQggHj78I?si=GtUoh377sFokqRdJ
 
Back
Top