[Guide] About ports and (binary) packages

Hi gang!

Introduction (editorial section)

Every once in a while we see new(er) users who get utterly confused with the whole ports collection and the package management system and as a result they're often having a hard time trying to understand how this whole thing actually works.

Now, I honestly think that chapter 4 of the FreeBSD handbook does a good job in explaining all this but I also think we could go a little deeper as well. For example: "Packages are normally compiled with conservative options", so... what counts as conservative? And in all honesty: after reading paragraph 4.2 I'm not quite convinced that it has clearly explained all the differences.

Now.. I realize that many people have already explained this whole thing in several forum posts, but the problem with those is that it's usually harder to find. I simply wanted a post in an easy to find location such as the guides section, so here goes...

FreeBSD package management

What do FreeBSD, Windows and Linux have in common (note: OS X could also be included here, but because I have very little experience with that I'm only mentioning it)?

They all use a package manager (or something very much alike) to keep an overview of all the installed software on the system. This allows the OS to track which files belong to which software (package), and that will allow the OS to ensure that whenever any installed software gets removed none of its files will be left behind on the system (that's the theory at least).

On Linux some very common names for package management are DPKG (used on Debian based distributions) and RPM (used on RedHat based Linux distributions).

In the past FreeBSD used the same package management tools as used on Sun Solaris, but it has somewhat recently moved onto "pkgng" or pkg in short.

pkg(8) is basically an "all in one" package manager (as I like to call it) and I think it's because of that why plenty of users get confused at first.

Package managers vs. package installers

When looking at Linux again (not for too long, don't worry) then you'll notice that there is a clear separation between a package manager and an installer. For example the previously mentioned DPKG ("Debian GNU/Linux Package Manager") is only used to manage packages. So to add, query or remove packages from the system. The moment you want to install something from a remote repository you'd be looking at apt-get which is basically a separate program.

The same applies to RPM ("RedHat Package Manager"). While you'd use rpm to query, add or remove packages you're going to need yum when you want to install something from a repository.

FreeBSD on the other hand doesn't know this separation because the pkg program can do all of the above. And I think it's partly because of that why some people tend to get confused at first. If that is the case then you might want to think of pkg as DPKG + apt-get combined. Or RPM + yum of course.

This is also why it is important to make a clear separation between the different ways of installing a package on FreeBSD: You can add a package using pkg add but you can also do this using pkg install. The main difference between these two options is that the add function installs ("adds") a local package which (usually) resides on your server whereas the install function downloads a package from a remote repository before installing it. See pkg-add(8) and pkg-install(8) respectively (but I'll also explain some more later on).

Installing software on FreeBSD

The Ports collection

The ports collection is commonly regarded as the "regular" way to install software. The main reason for this - in my opinion obviously - is because the Ports collection gives you complete control over how you want a port to be installed.

But what exactly is this "Ports collection" anyway? This is not the official description, but I always like to think of the Ports collection as one huge collection of blueprints. Each port basically consists of a directory which contains one or more files which in their turn contain more information about the port itself. The "blueprint" (or Makefile) contains instructions which tells the system how it should obtain the software (usually the location from which it can download the source code), how it should prepare the software (think about running configure but also about optionally patching the source code to comply to the FreeBSD standards), how it should prepare the system when required (think about ensuring that any other required software used to either build or run the software we want to install is also present on the system) and finally how it should package and install the software.

I put some extra emphasis on that last part and for a very good reason: often people wonder why all ports have at least one requirement (or dependency): that pkg (or: ports-mgmt/pkg) is installed.

The reason for that is simple: when you build a port all you basically do is tell the system to create a package which, when build successfully, will eventually be added to your system using pkg-add(8).

So: building a port is basically nothing more than setting up the software (which usually means compiling it), then creating a package which then gets added to your system using pkg.

Does this sound weird to you? No worries. Just look at the ports(7) manual page. This lists all the so called build targets which you can use to administer your ports. Targets such as build (builds or compiles the port), extract (only downloads and extracts the software in its working directory) and... package. When using this target all which happens is that the port will be build and then packaged into a txz file, ready to be added to your system using pkg add.

Binary packages

The other way to install software on FreeBSD is using binary packages. Instead of using the Ports collection you're now using pkg to grab the software from a remote repository. And although it may seem as if the Ports collection isn't being used at all the truth is actually a bit different. Because as mentioned before a binary package is nothing more than a port which was build using the default configuration. So: running make config in a port directory and then immediately saving the currently selected options without making any further changes.

As such the main difference between the two is that you can skip the building part because someone else has already done that for you. However, and this is an important detail, this also means that they've done the configuration part as well.

Mixing ports and packages: good or bad?

This is one the main things which confuses a lot of people. Should you mix ports and packages? After all: if both are basically the same thing (I even said so myself) then surely mixing them together shouldn't be a problem at all?

Well... yes and no. In my opinion there's a simple rule here: if you're asking yourself if mixing ports and packages is good or bad then you shouldn't bother with this and follow the general guideline which says that it is a bad idea to mix these two up. Because generally speaking it can easily mess up your system really bad.

Now, the main problem with this is that it's not very obvious at first. It's not as if your server will suddenly explode or trigger kernel panics after you installed a binary package and mixed it with software which you installed using the Ports collection. It's nothing as drastic as that. But it can definitely cause a lot of problems over time.

That is basically the main concern here: when you do run into problems will you still be able to determine their cause? I mean.. is it because you mixed these two sources some time ago, is it because there's a bug in the system somewhere or maybe you made a mistake yourself with configuring stuff?

So what's the big problem?

It all boils down to configuration.

One of the key strengths of the ports collection is that you can basically configure the software any way you seem fit. For example, on my servers the Apache webserver has no support for user directories or WebDAV at all. I didn't simply disable the extensions, I made sure that they weren't build in the first place.

However... A binary package is build using all the default options, and it will also assume the same thing for each of its dependencies.

So what would happen if I were to install a binary package which depends on the Apache server to be present on my system? Note: within the context mentioned above: I already installed Apache through the Ports collection and customized its configuration.

Well; during the installation phase pkg will notice that the software which I'm installing depends on Apache. Then it will notice that Apache is already installed, so all that's left to do is to install the software as I requested.

So what would happen if this software depended on WebDAV?

Or what would happen if I were to install more software which depended on this obscure software?

You'd basically create a whole chain of software (now referring to the dependencies) while there's a huge problem at the core (Apache in this case): missing functionality.

This is the major problem when mixing these two things together.

As mentioned before: it doesn't always have to result in problems. For example: if I were to install a binary ("pre-compiled") software package which doesn't depend on anything except for some libraries in the base system, and which isn't required by any other packages then it's very unlikely that you'll run into problems.

But if you're installing something which has a lot of dependencies and which is also required by lots of other software then you could easily find yourself in a huge dependency mess.

Therefor, as a rule of thumb, it's much safer not to mix these two methods of installing. Either rely on the ports collection or on binary packages, but don't use both. Especially not if you're planning on customizing the build options.

And there you have it. I hope this can help some of you guys out in clearing a few of these issues up.