Hi gang!
Disclaimer: I am honestly a little excited about recent developments so expect to find some (small) opinionated parts in this guide. Nothing excessive mind you, but I can sometimes get a little carried away and despite some believes I never really plan guides like this.
Editorial
In April 2021 the FreeBSD project has finished its full conversion from Subversion to Git, something which had been brewing for quite some time. As a result many people who are quite familiar with Subversion may have to re-learn a few things. And there will be some who don't see any advantages here, what's this big fuss all about? Trust me: there are many advantages, some are huge in my opinion (but please keep my disclaimer in mind, thanks!), but if that also applies to you is something I obviously can't say for sure.
And before I continue: there are also disadvantages, no question about it; I've seen it myself (and I'll share them too). See, just because these don't bother me doesn't mean I don't recognize the possibility that these can still bother others. We'll address them, no worries.
But honestly you guys.. I'm a little excited here. I just finished the conversion on my main server last night (I also stopped using portsnap in favor of git) and I like where this is going. Well, I hope that after reading my guide you may also find some advantages that can help you out. And of course I also hope the guide as a whole will be useful to you guys.
As always I'm going to try and cover the full deal, so if you come across a section which you already know about then you can easily skip it; I'll make sure to keep them (mostly) separated so they don't rely on each other.
What is Git? (and what should I do with it? )
Git is a so called Version Control System ("VCS") which can help people to keep control over a project. A project can be anything; a single shell script, a group of shell scripts, the source code for a program (or many programs) and you can even maintain a kernel with it. Heck, Git doesn't even mind the inclusion of binary files (but now I'm getting a little ahead of myself).
Every time something changes in a project then these changes can be documented and stored using the VCS. As a result the VCS will keep track of the projects history, which gives you full control over every change. See, it doesn't only document these changes; it also records them meaning that you can easily go back in time and bring your project into a state it had before. So if it turns out that a certain change (or addition) was bad then it won't be a problem: just go back, remove the changes and optionally re-do any other additional changes and you managed to clean up your project without having to re-create or re-write dozens of changes.
So how does this work? Well, the VCS maintains a database in which it stores information about all the changes and/or additions to the project. By default Git uses the .git directory for this: you'll find it in the root directory of almost every project that is under version control. If not a directory then it'll be a .git file which contains a pointer to that directory. And as I mentioned earlier this project could be anything: from a single file to a whole collection of files and/or directories. We refer to a project that is under version control as a repository.
What makes Git stand out from the rest is its decentralized design. Or put differently: Git will always use a local database for the project. It doesn't matter if you started the project yourself or are merely copying one (in Git terms this is called "cloning"): you'll always end up with a fully functional project repository and all the (dis)advantages that come with it no matter what. See, one possible disadvantage is that this means you'll get the entire backlog of the project too and that is going to gobble up storage space. But a possible huge advantage could be the fact that you can easily share such a cloned project with others as well.
For example: my server is fully under "Git control" meaning that I use Git to maintain the Ports collection, the source tree and the FreeBSD documentation project. I also use a second (smaller) FreeBSD server as backup (backup MTA & DNS). So once I updated the Ports collection (or source tree!) on my main server I don't have to waste precious bandwidth to do the same on my other server. Naah, since both my servers are part of a virtual LAN I simply clone the repository directly from my main server. Easy!
So in my situation the (small) excess in storage space gets made up for with a (huge) reduction in bandwidth. Of course... if you only maintain one server all you're left with is that excess in storage space, I feel you... but trust me: there are stilll more advantages here.
Now that we roughly know what Git is all about let's take a look at how we can use this for FreeBSD systems administration.
The Ports collection
(brief) description/intro
(ey, I like to make my guides useful for newbies & veterans alike, just skip this part if you're a veteran)
The Ports collection is a collection of "blueprints" (as I like to call it) which can help you to install external software onto FreeBSD. Every Port contains a "blueprint" (a Makefile) which will tell the system how to obtain the software, how to prepare the software (usually this means compiling the source code) and finally how to install it (the system creates a package which is then installed using ports-mgmt/pkg). Keep in mind that the ports collection is most useful if you need (or want) to use the software with very specific customizations. For example: by default Git provides support for CVS, a GUI and even a web interface. If you don't need that functionality you could build Git using the Ports collection, de-select these options and then build & install Git. Now you'll have a Git version without that "bloat".
But if you don't need any of these customizations then you're likely much better off using FreeBSD's package manager. In other words: instead of using
Keep in mind: mixing ports and binary packages is a very bad idea! So always use one method or the other, not both. If you wonder why then this guide might be a good read.
Installing the Ports collection (with Git)
As of April 2021 the FreeBSD project is using a dedicated server for their Git repositories: git.freebsd.org. And all we need to know here is that Git allows the usage of "accessible" protocols such as HTTP and HTTPS (and FTP(S), SSH, SMB, NFS, it even has its own GIT protocol! ). Another important detail is that it's custom to provide a repository using the .git extension, even though you're actually sharing a directory. You got to admit it does make things easier to recognize.
This is honestly all we need to know (provided that we're familiar with Git of course).
So, to install a new (fresh) ports collection you'd use:
This will install the Ports collection (with its entire backlog) into /usr/ports. Once this is done you can use the Ports collection as you always have. Need to update it? Easy, just use:
Changing from Subversion to Git
This can become a little tricky, especially if you also maintain subdirectories like distfiles and packages and would like to keep these. But don't worry, that's why this guide exists!
The easiest approach here is to move these two directories out of the way, fully delete the contents of /usr/ports and then grab the Ports collection as I mentioned above. If you're using ZFS and haven't set up separate datasets for these two then now might be a good idea to do so. Why? Well, for starters because this would allow you to unmount these datasets for now using
But there's more to this story...
ZFS Ports collection storage management
As you probably know the bulk of /usr/ports is used by (ASCII) (Make)files (and patches). And text files are - generally speaking - very easy to compress. Since ZFS provides you with file compression out of the box, why not use it?
All it takes is one command:
Now, more important details: distfiles (and optionally packages) contain archives. distfiles is used to store the archive(s) which were downloaded by the system in order to provide you with the actual software which the ports refers to. Remember: if you're building a port like, say, the Apache webserver then the system will start by downloading the source code from the Apache website, (temporarily) install that onto your system and then build the port. And in order to prevent the system from having to download the source code again if you need to re-build the port it keeps the downloaded archive in the distfiles directory.
packages on the other hand is often used by ports-mgmt/portmaster (and some others). In short this can be the ideal location to set up your own package repository which you can then provide to other servers in your network. Portmaster on the other hand uses this location to (optionally) store backup packages. When I upgrade a port on my system then Portmaster will keep the previous version in this directory, so should something go wrong I can always go back to the previous (working) setup no questions asked.
So the point here is that these two directories contain archives, and those are already compressed. It would be a waste of system resources if we'd let ZFS compress these files as well. And as you might know: ZFS properties propagate. In other words: properties set for a dataset will also apply to its children.
Therefor...
... you should make sure to turn compression off for any child datasets (provided that you set things up in the same way as I did of course).
But what if my /usr/ports directory isn't empty?!
So yeah, that's the ideal situation above but as we all know situations are usually far from ideal . So there's good news and bad news for you...
The bad news is that Git will refuse to clone a repository into a directory which isn't empty:
Now what?! Yah, to make matters worse (I'll spare you the time to study git-clone(1) (for now )): you won't find any overrule or "force" options to use with the cloning process. But don't worry, there's a solution!
Important: When I refer to a 'ports' directory which isn't empty I'm only referring to the possible existence of additional subdirectories (as mentioned above) or (hidden) file entries as used by the UFS filesystem. I strongly suggest you do not try this approach in order to save you some download time (for example by trying to clone the Git ports repository into an existing Subversion repository). I'm not saying it won't work (I honestly don't know because I never bothered trying) but I do foresee a lot of problems if you'd try. Feel free to prove me wrong though
Now the good news is that we can "trick" Git to actually bypass the cloning process:
So to summarize:
Sort off... First we made our own local Git repository by using
The only remaining problem though were the two different branch names. If you look at the above output again you'll notice that our new local branch was called master but when fetching the information we see mention of remote/main, so the remote branch is called main (when in doubt you can try to look up HEAD using:
So now we know that we have 2 separate branches: master (local) and origin/main (remote). As such, all that's left to do is to combine them, and we did that using git-merge(1). The only step remaining was to set an upstream, which is what git-branch(1) did for us.
Simply put: Git truly lives up to the Unix philosophy and some of its commands (like "pull" or "clone") are basically nothing more but a series of other commands which get executed in sequence.
And if you look at the differences between a cloned repository and a "fetched" one you'll notice that it's a complete non-issue:
Here you see what it looks like when you simply clone the Ports repository. Notice the mention of origin/main up there? And after the second command you can clearly see that origin is a referral to https://git.freebsd.org. Since main is our local branch... it should go without reason that origin/main is simply a referral to the remote branch. This remote branch is "connected" to ours and therefor Git always compares the status of our local branch with that of the remote. In Git terminology we refer such a "connected" remote branch as an upstream.
Now let's compare this with the other methodology:
As you can see it's roughly the same; the main difference is that instead of using main we're using master as our local branch. Which is actually not a bad thing because it might make it easier on you to keep track of your local branch and the remote one(s).
Now, you could be tempted to simply (ab)use this information and make sure that all your new local branches are automatically called main. Problem solved, eh? Well... no. See, the name of a main branch is fully up to the person who set up the local repository. Mine always use 'master' because that's what I prefer. So by using the method I demonstrated above it doesn't matter what the remote branch is going to be named, you'll always be able to download & connect them, no matter what. Just remember to check up on HEAD (denoted by @) when in doubt.
End of Part I (message too long (as usual!) )
Disclaimer: I am honestly a little excited about recent developments so expect to find some (small) opinionated parts in this guide. Nothing excessive mind you, but I can sometimes get a little carried away and despite some believes I never really plan guides like this.
Editorial
In April 2021 the FreeBSD project has finished its full conversion from Subversion to Git, something which had been brewing for quite some time. As a result many people who are quite familiar with Subversion may have to re-learn a few things. And there will be some who don't see any advantages here, what's this big fuss all about? Trust me: there are many advantages, some are huge in my opinion (but please keep my disclaimer in mind, thanks!), but if that also applies to you is something I obviously can't say for sure.
And before I continue: there are also disadvantages, no question about it; I've seen it myself (and I'll share them too). See, just because these don't bother me doesn't mean I don't recognize the possibility that these can still bother others. We'll address them, no worries.
But honestly you guys.. I'm a little excited here. I just finished the conversion on my main server last night (I also stopped using portsnap in favor of git) and I like where this is going. Well, I hope that after reading my guide you may also find some advantages that can help you out. And of course I also hope the guide as a whole will be useful to you guys.
As always I'm going to try and cover the full deal, so if you come across a section which you already know about then you can easily skip it; I'll make sure to keep them (mostly) separated so they don't rely on each other.
What is Git? (and what should I do with it? )
Git is a so called Version Control System ("VCS") which can help people to keep control over a project. A project can be anything; a single shell script, a group of shell scripts, the source code for a program (or many programs) and you can even maintain a kernel with it. Heck, Git doesn't even mind the inclusion of binary files (but now I'm getting a little ahead of myself).
Every time something changes in a project then these changes can be documented and stored using the VCS. As a result the VCS will keep track of the projects history, which gives you full control over every change. See, it doesn't only document these changes; it also records them meaning that you can easily go back in time and bring your project into a state it had before. So if it turns out that a certain change (or addition) was bad then it won't be a problem: just go back, remove the changes and optionally re-do any other additional changes and you managed to clean up your project without having to re-create or re-write dozens of changes.
So how does this work? Well, the VCS maintains a database in which it stores information about all the changes and/or additions to the project. By default Git uses the .git directory for this: you'll find it in the root directory of almost every project that is under version control. If not a directory then it'll be a .git file which contains a pointer to that directory. And as I mentioned earlier this project could be anything: from a single file to a whole collection of files and/or directories. We refer to a project that is under version control as a repository.
What makes Git stand out from the rest is its decentralized design. Or put differently: Git will always use a local database for the project. It doesn't matter if you started the project yourself or are merely copying one (in Git terms this is called "cloning"): you'll always end up with a fully functional project repository and all the (dis)advantages that come with it no matter what. See, one possible disadvantage is that this means you'll get the entire backlog of the project too and that is going to gobble up storage space. But a possible huge advantage could be the fact that you can easily share such a cloned project with others as well.
For example: my server is fully under "Git control" meaning that I use Git to maintain the Ports collection, the source tree and the FreeBSD documentation project. I also use a second (smaller) FreeBSD server as backup (backup MTA & DNS). So once I updated the Ports collection (or source tree!) on my main server I don't have to waste precious bandwidth to do the same on my other server. Naah, since both my servers are part of a virtual LAN I simply clone the repository directly from my main server. Easy!
So in my situation the (small) excess in storage space gets made up for with a (huge) reduction in bandwidth. Of course... if you only maintain one server all you're left with is that excess in storage space, I feel you... but trust me: there are stilll more advantages here.
Now that we roughly know what Git is all about let's take a look at how we can use this for FreeBSD systems administration.
The Ports collection
(brief) description/intro
(ey, I like to make my guides useful for newbies & veterans alike, just skip this part if you're a veteran)
The Ports collection is a collection of "blueprints" (as I like to call it) which can help you to install external software onto FreeBSD. Every Port contains a "blueprint" (a Makefile) which will tell the system how to obtain the software, how to prepare the software (usually this means compiling the source code) and finally how to install it (the system creates a package which is then installed using ports-mgmt/pkg). Keep in mind that the ports collection is most useful if you need (or want) to use the software with very specific customizations. For example: by default Git provides support for CVS, a GUI and even a web interface. If you don't need that functionality you could build Git using the Ports collection, de-select these options and then build & install Git. Now you'll have a Git version without that "bloat".
But if you don't need any of these customizations then you're likely much better off using FreeBSD's package manager. In other words: instead of using
# make -d /usr/ports/devel/git install clean
you'd use: # pkg install git
. It'll also be much quicker! Keep in mind: mixing ports and binary packages is a very bad idea! So always use one method or the other, not both. If you wonder why then this guide might be a good read.
Installing the Ports collection (with Git)
As of April 2021 the FreeBSD project is using a dedicated server for their Git repositories: git.freebsd.org. And all we need to know here is that Git allows the usage of "accessible" protocols such as HTTP and HTTPS (and FTP(S), SSH, SMB, NFS, it even has its own GIT protocol! ). Another important detail is that it's custom to provide a repository using the .git extension, even though you're actually sharing a directory. You got to admit it does make things easier to recognize.
This is honestly all we need to know (provided that we're familiar with Git of course).
So, to install a new (fresh) ports collection you'd use:
# git clone https://git.freebsd.org/ports.git /usr/ports
. Done!This will install the Ports collection (with its entire backlog) into /usr/ports. Once this is done you can use the Ports collection as you always have. Need to update it? Easy, just use:
git pull
within the /usr/ports directory to "pull" any optional updates into your local repository.Changing from Subversion to Git
This can become a little tricky, especially if you also maintain subdirectories like distfiles and packages and would like to keep these. But don't worry, that's why this guide exists!
The easiest approach here is to move these two directories out of the way, fully delete the contents of /usr/ports and then grab the Ports collection as I mentioned above. If you're using ZFS and haven't set up separate datasets for these two then now might be a good idea to do so. Why? Well, for starters because this would allow you to unmount these datasets for now using
# zfs unmount zroot/ports/packages
and then continue with the above procedure.But there's more to this story...
ZFS Ports collection storage management
Code:
peter@vps:/usr/ports $ zfs list -r zroot/ports
NAME USED AVAIL REFER MOUNTPOINT
zroot/ports 6.68G 105G 1.05G /usr/ports
zroot/ports/distfiles 2.73G 105G 2.73G /usr/ports/distfiles
zroot/ports/packages 2.89G 105G 2.89G /usr/ports/packages
Code:
peter@vps:/usr/ports $ zfs get compression zroot/ports
NAME PROPERTY VALUE SOURCE
zroot/ports compression on local
# zfs set compression=on zroot/ports
(keep in mind that you'd have to replace zroot/ports with your own dataset name). Also important to know: this won't magically compress every file which already exists, only those you copy onto this dataset from now on. Therefor it makes sense to use this option when you create the new dataset (or directly afterwards).Now, more important details: distfiles (and optionally packages) contain archives. distfiles is used to store the archive(s) which were downloaded by the system in order to provide you with the actual software which the ports refers to. Remember: if you're building a port like, say, the Apache webserver then the system will start by downloading the source code from the Apache website, (temporarily) install that onto your system and then build the port. And in order to prevent the system from having to download the source code again if you need to re-build the port it keeps the downloaded archive in the distfiles directory.
packages on the other hand is often used by ports-mgmt/portmaster (and some others). In short this can be the ideal location to set up your own package repository which you can then provide to other servers in your network. Portmaster on the other hand uses this location to (optionally) store backup packages. When I upgrade a port on my system then Portmaster will keep the previous version in this directory, so should something go wrong I can always go back to the previous (working) setup no questions asked.
So the point here is that these two directories contain archives, and those are already compressed. It would be a waste of system resources if we'd let ZFS compress these files as well. And as you might know: ZFS properties propagate. In other words: properties set for a dataset will also apply to its children.
Therefor...
Code:
peter@vps:/usr/ports $ zfs get -r compression zroot/ports
NAME PROPERTY VALUE SOURCE
zroot/ports compression on local
zroot/ports/distfiles compression off local
zroot/ports/packages compression off local
But what if my /usr/ports directory isn't empty?!
So yeah, that's the ideal situation above but as we all know situations are usually far from ideal . So there's good news and bad news for you...
The bad news is that Git will refuse to clone a repository into a directory which isn't empty:
Code:
peter@vps:/home/peter/temp $ mkdir -p myports/distfiles
peter@vps:/home/peter/temp $ git clone https://git.freebsd.org/ports.git myports/
fatal: destination path 'myports' already exists and is not an empty directory.
Important: When I refer to a 'ports' directory which isn't empty I'm only referring to the possible existence of additional subdirectories (as mentioned above) or (hidden) file entries as used by the UFS filesystem. I strongly suggest you do not try this approach in order to save you some download time (for example by trying to clone the Git ports repository into an existing Subversion repository). I'm not saying it won't work (I honestly don't know because I never bothered trying) but I do foresee a lot of problems if you'd try. Feel free to prove me wrong though
Now the good news is that we can "trick" Git to actually bypass the cloning process:
Code:
peter@vps:/home/peter/temp $ cd myports/
peter@vps:/home/peter/temp/myports $ git init
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /home/peter/temp/myports/.git/
peter@vps:/home/peter/temp/myports $ git remote add origin https://git.freebsd.org/ports.git
peter@vps:/home/peter/temp/myports $ git fetch origin
remote: Enumerating objects: 15730, done.
remote: Counting objects: 100% (15730/15730), done.
remote: Compressing objects: 100% (417/417), done.
Receiving objects: 4% (199650/4991230), 45.42 MiB | 10.09 MiB/s
[...]
* [new branch] 2020Q3 -> origin/2020Q3
* [new branch] 2020Q4 -> origin/2020Q4
* [new branch] 2021Q1 -> origin/2021Q1
* [new branch] 2021Q2 -> origin/2021Q2
* [new branch] main -> origin/main
* [new tag] 10-eol -> 10-eol
* [new tag] 7-eol -> 7-eol
* [new tag] 8-eol -> 8-eol
* [new tag] 9-eol -> 9-eol
* [new tag] pkg-install-eol -> pkg-install-eol
[...]
peter@vps:/home/peter/temp/myports $ git merge origin/main
peter@vps:/home/peter/temp/myports $ git branch --set-upstream-to=origin/main
Branch 'master' set up to track remote branch 'main' from 'origin'.
So to summarize:
- Create a new empty repository:
git init
. - Add the remote repository and call it origin:
git remote add origin https://git.freebsd.org/ports.git
. - Fetch all available information about the remote repository:
git fetch origin
. - Merge the remote repository main 'into' our local branch:
git merge origin/main
. - Make our local (master) branch track origin/main:
git branch --set-upstream-to=origin/main
.
Sort off... First we made our own local Git repository by using
git init
. And as I mentioned earlier: you always use a full blown repository no matter what. It doesn't matter if you clone an existing repository or make a new one. And about that cloning... If you clone a repository you're effectively taking a few steps, steps which you can also do manually. First we fetched everything there is to know about the remote repository - but only that - by using git-fetch(1). This provided us with a lot of useful information about the remote, in specific Git learned about new branches and tags (as shown above).The only remaining problem though were the two different branch names. If you look at the above output again you'll notice that our new local branch was called master but when fetching the information we see mention of remote/main, so the remote branch is called main (when in doubt you can try to look up HEAD using:
git branch -r --contains @
(only works after the 'merge')).So now we know that we have 2 separate branches: master (local) and origin/main (remote). As such, all that's left to do is to combine them, and we did that using git-merge(1). The only step remaining was to set an upstream, which is what git-branch(1) did for us.
Simply put: Git truly lives up to the Unix philosophy and some of its commands (like "pull" or "clone") are basically nothing more but a series of other commands which get executed in sequence.
And if you look at the differences between a cloned repository and a "fetched" one you'll notice that it's a complete non-issue:
Code:
peter@vps:/usr/ports $ git status
On branch main
Your branch is up to date with 'origin/main'.
peter@vps:/usr/ports $ git remote -v
origin https://git.freebsd.org/ports.git (fetch)
origin https://git.freebsd.org/ports.git (push)
peter@vps:/usr/ports $ git branch
* main
Now let's compare this with the other methodology:
Code:
peter@vps:/home/peter/temp/myports $ git status
On branch master
Your branch is up to date with 'origin/main'.
peter@vps:/home/peter/temp/myports $ git remote -v
origin https://git.freebsd.org/ports.git (fetch)
origin https://git.freebsd.org/ports.git (push)
peter@vps:/home/peter/temp/myports $ git branch
* master
Now, you could be tempted to simply (ab)use this information and make sure that all your new local branches are automatically called main. Problem solved, eh? Well... no. See, the name of a main branch is fully up to the person who set up the local repository. Mine always use 'master' because that's what I prefer. So by using the method I demonstrated above it doesn't matter what the remote branch is going to be named, you'll always be able to download & connect them, no matter what. Just remember to check up on HEAD (denoted by @) when in doubt.
End of Part I (message too long (as usual!) )