HEADS UP: FreeBSD changing from Subversion to Git this weekend

bsdimp · Dec 17, 2020

Greetings,

The FreeBSD project will be moving it's source repo from subversion to git
starting this this weekend. The docs repo was moved 2 weeks ago. The ports
repo will move at the end of March, 2021 due to timing issues.

The short version is that we're switching the version control we're using.
This switch will preserve much of the current FreeBSD development workflow.
After the switch, the subversion repo will become almost read-only. All
future work will be done in git, however as a transition aide we'll be
replaying the MFCs to stable/11, stable/12 and the related releng branches
for the life of those branches.

For more detailed information, please see
https://github.com/bsdimp/freebsd-git-docs/ for the current documentation.

Please see https://wiki.freebsd.org/git for the latest detailed schedule
(please note that this schedule is subject to change).

Warner

a6h · Dec 17, 2020

Regarding the non-Committer aka read-only checkout for normal users:
1. Should we delete our SVN-based directories e.g./usr/doc right now,
and checkout new repo from the GIT, or the SVN is still working for us?
2. Is there an estimated deadline for the end of the readonly SVN repo?

a6h · Dec 17, 2020

Does FreeBSD GitHub repo replicate in other servers outside of the U.S., or rest of us are subject to Microsoft/U.S. possible restrictions laws/policies in the future?
For example, if for whatever reason, Microsoft/GitHub decides impose regional/IP-range restrictions on GitHub servers, that implies no FreeBSD source checkout!
I'm just speculating and don't imply anything, but these corporations have a history of imposing blanket ban (service/medium) on individuals/regions in the past.

msplsh · Dec 17, 2020

My understanding is that github is not the master, just a mirror, so you'd be pulling from FreeBSD, not them.

rigoletto@ · Dec 17, 2020

vigole said:
Does FreeBSD GitHub repo replicate in other servers outside of the U.S., or rest of us are subject to Microsoft/U.S. possible restrictions laws/policies in the future?
For example, if for whatever reason, Microsoft/GitHub decides impose regional/IP-range restrictions on GitHub servers, that implies no FreeBSD source checkout!
I'm just speculating and don't imply anything, but these corporations have a history of imposing blanket ban (service/medium) on individuals/regions in the past.

The 'sources of the truth' are the FreeBSD own servers then mirrored to both GitHub and GitLab. Btw, GitLab download considerably faster than GitHub for me.

chessmaster · Dec 20, 2020

So git clone would be the new way to get sources? Reason I ask is there was no mention. Of subversion becoming the "obsolete", way of checking out the default "/usr/src".

msplsh · Dec 20, 2020

Probably not going to make any difference until after the handbook is updated.

PMc · Dec 20, 2020

It certainly makes a difference: it does away with the very reason why I dumped linux and switched to FreeBSD in 1995 (order and discipline).

ralphbsz · Dec 20, 2020

Can we perhaps discuss the technical aspects (how to switch) in this thread, and leave the activism and politics of it to a thread in off-topic?

shkhln · Dec 20, 2020

Poor, poor SVN.

PMc · Dec 20, 2020

ralphbsz said:
Can we perhaps discuss the technical aspects (how to switch) in this thread, and leave the activism and politics of it to a thread in off-topic?

I didn't see any non-technical aspects mentioned here. And how to switch is simple: just throw away your entire deploy toolchain that was grown over the last 12 years (if it is based on monotonously increasing version numbers) and write a new one from scratch.

shkhln · Dec 20, 2020

PMc said:
just throw away your entire deploy toolchain

Let's not be overly dramatic.

PMc · Dec 20, 2020

shkhln said:
Let's not be overly dramatic.

I don't remember You ever mentioning that You would even utilize a deploy toolchain of Your own make...

shkhln · Dec 20, 2020

Can you describe the technical side of your issue? What do you do with those increasing revision numbers?

PMc · Dec 20, 2020

Well, on the large scale, I do most of what tools like e.g. poudriere offer (but by utilizing zfs snapshots), plus including rollouts of the base distribution (buildworld & friends) into the scheme.
On the small scale, these are monotonously increasing revision numbers. Recently there was a switch from CVS to SVN (and that switch had a technical justification), and the monotonously increasing revision numbers were one of the gifts provided by that switch (at that time things were still changing for the better). CVS, if You mind to recall, would have had monotonously increasing revision numbers also, but these would change independently and separately for each and every file, which is quite unmanageable without utilizing tags. SVN, on the contrary, does not need tags, because the revision number is enough to uniquely specify the entire distribution, and is also suitable to compare which one is newer. (Tags cannot be compared numerically.)

rigoletto@ · Dec 20, 2020

Could someone explain me why sometimes I run 'git pull' on some repository absolutely no local change (e.g. src happened today) and I receive a total mess of conflicts, and git ask to commit my local changes (or something like that)?

ralphbsz · Dec 21, 2020

rigoletto@ said:
Could someone explain me why sometimes I run 'git pull' on some repository absolutely no local change (e.g. src happened today) and I receive a total mess of conflicts, and git ask to commit my local changes (or something like that)?

Never seen something like that.

To PMc's questions: Your complaint about CVS having independent version numbers for each file is closely tied to the fact that CVS doesn't have "change sets". That means: one commit can change several files at once, and the whole commit is either applied or undone or merged between branches as a unit. I think that's a vitally important functionality, and I think all modern version control systems have it.

The big problem with version numbers not being monotonic is due to the distributed nature of git, and is pretty much unavoidable in a distributed source control system. Say the current release of some artifact is 42. Alice checks it out, modifies it, and commits it to some git archive somewhere. Now it is version 43. Bob also gets a copy of the version 42, and also changes it. Can he make it version 43? No, then we would have ambiguity. Can he find out that Alice has already created version 43 and go to 44? No, because in a distributed system, there is no way for him to know that Alice exists, or contact her repository. Matter-of-fact, the impossibility of assigning increasing version numbers is an example of the FLP and CAP theorems of distributed computing: to get consensus on version numbers, one would either need a central server (like what CVS and Subversion use), or each participant in the protocol needs to know who all the other participants are (and then one can use something like a Lamport clock or Paxos or Chandra-Toueg). But that would restrict git too much, for its intended usage pattern (of completely independent developers). I don't think it's even possible in general to determine whether two versions of a file are "earlier" or "later": Say one file has two copies, one of which has changes 1, 2 and 4, and the second has changes 1, 3 and 4: which one should have the higher version number? That question just doesn't make sense. With independent changes, files are not like numbers, where it is true that a<b, a=c or a>b is always true. The question one can ask: does one version of a file contain all the changes of another version? With that, one can get a partial ordering, I think, but definitely not a complete ordering.

So in a nutshell: Any workflow that relies on increasing version numbers is toast. We could see that one coming a mile away. By the way, I've gone through the same pain. I used to have CVS version numbers in all my source files, and the main program would typically know how to collect them all and print them, so you could see what all the parts were. What I instead do today is to use the time last changed as the "best guess" at a version number, and then also print the git/mercurial revision string (which is a random-looking but unique 48 or 64-bit hex number). With ntp pretty reliable today, and my source code not changing very quickly, the "time last changed" is a pretty reliable indicator.

PMc · Dec 21, 2020

ralphbsz said:
The big problem with version numbers not being monotonic is due to the distributed nature of git, and is pretty much unavoidable in a distributed source control system. Say the current release of some artifact is 42. Alice checks it out, modifies it, and commits it to some git archive somewhere. Now it is version 43. Bob also gets a copy of the version 42, and also changes it. Can he make it version 43? No, then we would have ambiguity. Can he find out that Alice has already created version 43 and go to 44? No, because in a distributed system, there is no way for him to know that Alice exists, or contact her repository.

Well, they might just talk to each other. At least, in my youth that was a very common thing to do - specifically if one would work together on something.

So, lets subsume: The purpose of distributed revision control is that people can work simultaneously on the same piece of code, while being properly cage-kept and maintaining communications ban, and/or not even knowing that the other worker would exists.
That is indeed an interesting concept of "cooperation".

ralphbsz said:
Matter-of-fact, the impossibility of assigning increasing version numbers is an example of the FLP and CAP theorems of distributed computing: to get consensus on version numbers, one would either need a central server (like what CVS and Subversion use), or each participant in the protocol needs to know who all the other participants are (and then one can use something like a Lamport clock or Paxos or Chandra-Toueg). But that would restrict git too much, for its intended usage pattern (of completely independent developers). I don't think it's even possible in general to determine whether two versions of a file are "earlier" or "later": Say one file has two copies, one of which has changes 1, 2 and 4, and the second has changes 1, 3 and 4: which one should have the higher version number? That question just doesn't make sense.

That is not the question I would ask. The question I would like to as is rather:
When we ever can come to such a point, who then was in charge and has verified that changes 2 and 3 work properly together?

shkhln · Dec 21, 2020

PMc said:
SVN, on the contrary, does not need tags, because the revision number is enough to uniquely specify the entire distribution, and is also suitable to compare which one is newer. (Tags cannot be compared numerically.)

For deployment? Why do you want to compare them anyway? What if there is a regression in any particular revision?

rigoletto@ said:
Could someone explain me why sometimes I run 'git pull' on some repository absolutely no local change (e.g. src happened today) and I receive a total mess of conflicts, and git ask to commit my local changes (or something like that)?

That means somebody used git push, thus partially rewriting remote history. Git sees different hashes (diverging histories) and refuses to pull those changes automatically. You should probably git reset --hard a few local commits and try git pull again.

Lamia · Dec 21, 2020

How soon can we expect this page - https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ports-using.html - to be updated? Svn checkout does not yield ports update. And we should be reading git {clone, pull, etc} from it by now?

Is the anyone there that their poudrière fetches updates?

No updates from here - svn://96.47.72.69/ports/branches/2020Q4.

ralphbsz · Dec 21, 2020

PMc said:
Well, they might just talk to each other. At least, in my youth that was a very common thing to do - specifically if one would work together on something.

So, lets subsume: The purpose of distributed revision control is that people can work simultaneously on the same piece of code, while being properly cage-kept and maintaining communications ban, and/or not even knowing that the other worker would exists.

Actually, fully distributed source control means that people can make progress exactly WITHOUT talking to each other. And that's because in some open source development models, there is no central authority, no administration. For example, say there is a piece of code that knows how to count walking elephants. One programmer might enhance it for their own purposes, to also count flying elephants. Another programmer might start from the original base, and also count walking rhinos. Both can publish their changes. A third programmer might then pull both changes in, and create code that can count all pachyderms, walking or flying, without having to coordinate with any of them. None of them need to cooperate. It does create a lot of freedom, without any paranoia.

When we ever can come to such a point, who then was in charge and has verified that changes 2 and 3 work properly together?

Why does anyone has to be in charge? And from a software quality point of view: I would expect that anyone who writes any change follows good software engineering practices, writes a clear requirements document, reviews all artifacts, and test their code after it is done. The fact that you can mix-and-match changes from many sources doesn't have to reduce quality, if the developers have a mindset of creating high quality software.

Lamia · Dec 21, 2020

Lamia said:
How soon can we expect this page - https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ports-using.html - to be updated? Svn checkout does not yield ports update. And we should be reading git {clone, pull, etc} from it by now?

Is the anyone there that their poudrière fetches updates?

No updates from here - svn://96.47.72.69/ports/branches/2020Q4.

Here is the information - http://bsdimp.blogspot.com/2020/10/freebsd-git-primer-for-users.html?m=1 - should anyone be interested.

msplsh · Dec 21, 2020

Ports isn't ready, date is March.

PMc · Dec 21, 2020

ralphbsz said:
Actually, fully distributed source control means that people can make progress exactly WITHOUT talking to each other. And that's because in some open source development models, there is no central authority, no administration. For example, say there is a piece of code that knows how to count walking elephants. One programmer might enhance it for their own purposes, to also count flying elephants. Another programmer might start from the original base, and also count walking rhinos. Both can publish their changes. A third programmer might then pull both changes in, and create code that can count all pachyderms, walking or flying, without having to coordinate with any of them. None of them need to cooperate. It does create a lot of freedom, without any paranoia.

Yes, thats a nice idea about freedom. But there is a misconception: writing an OS is not a means in itself, done by the pure ambition of self-fulfillment no matter the outcome; it is instead a means to an end: to create something that actually works.

Now I perfectly understand that our ivory-tower league, namely the developers, are mainly interested in their self-fulfillment - and that's perfectly alright. But then, issues of freedom and paranoia have no place in engineering, and should actually rather be discussed with a therapist.

Anyway, we did already have exactly that, with Linux, in 1995 (and from what I learned, it has not changed in the meantime):
[the following is all practical, real and authentic experience of my own - it is in no way made up]

Act 1.
It begins with the code not doing the expected thing. You read the source and you figure, it should do the expected thing! Finally you figure out: the source is not from what the object was built! Somewhere there was a change - nobody knows where, nobody knows what, nobody knows why - and version numbers are a chaotic heap, so you never know what you're actually running.

Act 2.
Then, if you finally go and find some source to compile it yourself, to at least get an object that matches your source - there is no means to figure out if this source is the appropriate one matching to the rest of the system. Because it is a bazaar: there are lots of sources you can choose from. There are lots of versions of these. And, specifically, there is no monotonous numbering, so you cannot just read the commitlog in sequence, to understand what has developed and how we have gotten here.

Act 3.
During that process of looking into the source, you practically always find a bunch of bugs, mistakes and coding errors on the wayside. Some are obvious mistakes, and could just be corrected. But most are related to and interdependent with other functionality - so to solve the matter, one would first need to talk to the auther, to evaluate what they actually thought when writing it that way.
But, as You explained, we do not do this. We don't talk to each other anymore.

Act 4.
Finally, you may find a commit log that actually identifies the auther of something. But that doesn't help you in any way. Because all you get is a cipher under which the author writes. The actual contact data is protected, and is only uncloaked to customers of the github corporation.
But even then, if you manage to find some customer of github, and manage to have some message dispatched to the author, you may most likely not get a reaction.
Because, as You explained, we do not do this. We don't talk to each other anymore.

Exactly this is the reason why I dumped linux. And when I was pointed to FreeBSD, it became immediately obvious that here the things were done in the right way. There was a consistent codebase. The code in /usr/src would always exactly represent the running object, because it was built from there: straight down the line.
And, most important: there were people! People who knew what they were doing - people with a skill level so that I still think I should rather call them demi-gods.[1] And these people were going fully open! They were visible in public discussions, and they had signatures like old-school Usenet, like scientists have: sometimes even with full address and phone!

Now, today, all this has already decayed. Gradually and slowly, but nevertheless. In the old time, if you were to send a bug report, it got processed. Sometimes quickly, sometimes one or two years later. But it got processed.
Now we have tool to store away the bugreports, so that nobody needs to bother reading them.
And obviousely, as You have explained, everybody is just throwing in their beloved features, without ever caring for anything else, and none of these nor anybody else feels responsible for the outcome. So who should be concerned about bugs, at all? Obviousely nobody.

Then, as Zirias described in his paper (paragraph 7.), FreeBSD once had a culture of fixing and improving things over time. But no longer.

Nowadays, things are just thrown over the fence, I mean, into the codebase - and then the author disappears again. Take, for instance, the ULE scheduler. Since the beginning, people were complaining that it does not work well under all conditions. And consequentially there was great engagement in bullying those complainers, and telling them they should just revert to the old scheduler and shut up. (There was no engagement in looking into the code and figuring out what actually goes wrong there.)
Then, I was hit by the malfunction. So I grabbed dtrace, and figured what is going wrong (plus at least one additional bug found on the wayside). Obviousely, nobody cares. I have patches for these - I do not know if they make the behaviour better over-all; they just fix the malfunction I was running into. Obviousely, nobody cares.

I finally managed to figure the e-mail of the original author, and he actually responded! But then, he seems to be adherent to the google code-of-conduct (bottomline: "we just want to be happy developers, and we do not want to talk about nasty and unpleasant things, specifically not about such abominable things as bugs and malfuctions"). So, as soon as the topic came to bugs, communication stalled.

[1] Strange story on the wayside: when later I got a job, and started to do consulting, i.e. building Unix client/server infrastructures and Internet functionality for major european banks and insurance companies, I was considered a "guru" by my fellow consultants - because I was almost the only one there who would ever have looked into the source, who might even dare to write a kernel driver if need arises. While the others were mostly focused on reading release notes and doing installations/configurations along the book.
OTOH there were these demi-gods of the Berkeley OS: people like e.g. HPSelasky, or Matt Dillon - there was a couple of dozens of those, and their skill was so many magnitudes above what I could imagine, I never even dared to talk to them.

ralphbsz said:
Why does anyone has to be in charge? And from a software quality point of view: I would expect that anyone who writes any change follows good software engineering practices, writes a clear requirements document, reviews all artifacts, and test their code after it is done.

Yes, I was already waiting for that test-crap coming up. This is indeed what seems to be the new mindset: lets write any crap we want, because we have tests in place, and the tests will tell us if the crap works or not, and lets finally do away with any attempt for logical verification (commonly termed: "thinking").
Proper engineering means to logically think through the stuff, to understand what it does, in relation to the other components and the system as a whole. And this certainly cannot be done when you don't even know the other components.
I know that this is the point where it hurts - because people get very violent when you try to make them think - thinking is painful to them, and they want to avoid it.

ralphbsz said:
The fact that you can mix-and-match changes from many sources doesn't have to reduce quality, if the developers have a mindset of creating high quality software.

It is all about the mindset - and the mindset is that we have tests in place as a rope to catch us when we fall. And you behave entirely different when you know that you are protected: you no longer strive to behave error-free; you create more crap, because you think nothing too bad can happen from it. High quality is already abandoned at that point.

But also, the problem is: tests don't catch you when you fall! Tests can only protect from re-introducing problem that are already known (and fixed). Because, a test needs to be written first, and it can only be written if somebody has thought about and knows that there can be a possible malfunction that should be tested against.

I know the agile culture came up with a pragma that there should be at least as many LOC of tests as of active code, and so they started to create their test code automatically. Very great - so there is zero skill going into the tests, and you could just leave them away and have the same.

But, over all, I think we must understand that the ambitions of the users and those of the developers point in exactly the opposite direction.
While the users choose FreeBSD for reasons like those I mentioned above, or those Zirias mentions in his paper, and many of them choose FreeBSD deliberately while running away from Linux (for reasons like those mentioned above), and therefore have no interest whatsoever in getting back the Linux workflow - to the contrary, the main interest of the developers is to put FreeBSD where it belongs, as just another Linux distribution - for very obviouse reasons: the more FreeBSD becomes Linux, the easier becomes porting.

zirias@ · Dec 21, 2020

I think you might be interpreting a bit too much in a tool, sorry to say that

In fact, GIT doesn't enforce a workflow, it's used in enterprises quite successfully (also where I work, we moved there from TFS which is much more comparable to SVN than GIT).

I think the move to GIT has just practical reasons. It's very fast, especially working with local branches is awesome, and getting reviews based on pull requests is also a pretty nice thing (don't know whether FreeBSD wants to use this).

I'm just in process of submitting an update to a port I maintain. I will open a PR with an attached diff. That's, well, okayish. For a simple change, it's all you need. I once had a bigger change where I prepared a git pull request, not for ever merging it, just as a nice review tool.

Well, in a nutshell, I'd argue the way the project works doesn't depend on the tool used. And IMHO, GIT is just the better tool nowadays.