Moving /var/log to another zfs pool

Mage · Feb 3, 2017

I need to move the /var/log to a non-root pool for performance reasons.

It’s not rocket science to set a mountpoint to /var/log. However, I would like to be 100% sure this won’t cause any issue. There are funny (sometimes binary) files in /var/log I have no idea about. I thought if they are already opened when the second pool is mounted, that would be less than ideal. The second pool is auto-mounted "normally" by the system as it’s properly imported.

I guess the log files created by software installed by pkg will be fine. What about the base system’s logs?

Before I opened the thread, I tested it. I have turned on all.log and console.log. I rsynched the old folder to the new one, renamed the old /var/log (using zfs rename -f), created an empty directory to have /var/log on the root pool, mounted the new directory from the another pool on it, restarted syslogd, and rebooted.

After the reboot, I mounted the /var as a nullfs to check its content. The /var/log on the root pool was empty. The console.log on the new pool starts the same way as another console.log starts on a "normal" system:

Code:

Feb  1 02:09:33 hostname kernel: Setting hostuuid: 1exxxx
Feb  1 02:09:33 hostname kernel: Setting hostid: 0x60xxx.
Feb  1 02:09:33 hostname kernel: Starting file system checks:

It seems to be okay. However, it would feel much safer to know for sure this is okay, before it goes to production.

Thank you.

ShelLuser · Feb 4, 2017

This should be no problem what so ever.

Code:

breve:/home/peter $ zfs list -r zroot/var
NAME                 USED  AVAIL  REFER  MOUNTPOINT
zroot/var            892M  7.49G  39.8M  /var
zroot/var/db         736M  7.49G   173M  /var/db
zroot/var/db/mysql   458M  7.49G   415M  /var/db/mysql
zroot/var/db/pkg    35.1M  7.49G  17.9M  /var/db/pkg
zroot/var/log       85.7M  7.49G  10.2M  /var/log
zroot/var/mail      77.5K  7.49G  63.5K  /var/mail
zroot/var/run        388K  7.49G  77.5K  /var/run
zroot/var/tmp         72K  7.49G    21K  /var/tmp

As you can see I use the same setup on my server and this has never caused any issues.

Mage · Feb 4, 2017

So, isn’t zroot your root pool?

ShelLuser · Feb 5, 2017

Mage said:
So, isn’t zroot your root pool?

Yes it is. I've got 2 pools on this server even, zroot and zdata.

Mage · Feb 5, 2017

I want to move it from the root pool to a non-root pool.

I’m not sure when exactly the non-root pools are mounted during the boot process. That’s why I’m concerned.

ShelLuser · Feb 5, 2017

Oh, whoops. Sorry, my bad. Misread up there, I also assumed a bit too much it seems.

In theory it should be able to work. I mean, in a regular UFS based environment it also doesn't matter if /var/log sits on a different slice as long as it gets mounted during the boot process. And the ZFS pools are also mounted during that same period. Of course this is all theoretical and not the conclusive answer you're looking for.

Even so: if performance is that much an issue here I can't help wonder if you ever considered using a dedicated loghost? So simply tell syslog to send everything to a remove host and let it deal with the whole thing. Not the answer you're looking for but figured I'd mention it.

Mage · Feb 5, 2017

Okay, here is the long version:

I’ve never used UFS. ZFS is why I switched to FreeBSD years ago. (Now I love FreeBSD for many other reasons too.)

I always had at least twenty file systems on the root pool. I created a shell script based on one of the "root on zfs" manuals for my first install ever. I don’t know how the FreeBSD installer looks like. I know the welcome page is blue. Then I select Live CD.

The script has everything from gpart create to reboot. (It’s nothing special, it’s just all the commands of the tutorial together, with slight changes and defaults to rc.conf). I’ve always put the /var and /var/log on separated file systems. But until now, it was always the root pool.

Two or three months ago I partially moved to Google Cloud. The FreeBSD image is hidden there, yet it exists. It’s using UFS. I converted it to ZFS, using the layout I use everywhere. I used the existing image as I thought it had been optimized for the environment (except the file system and the wrong settings in various files).

In the beginning, it was excellent. Then a non-linear performance drop happened as the traffic grew. It took a while to figure out that the image size Google offers for your "boot" (root) disk, which is 21GB for FreeBSD, is so small that logging becomes an issue. Yes, I created a new image with ZFS but I used the same size. It’s not even 50% occupied.

In Google Cloud, the drive’s performance depends on the size of the drive. It’s more or less linear. I didn’t expect it to have such impact on the logging as it has.

Google recommends you to use several disks. I thought they knew their own system better than I did.

My logging needs aren’t extreme. Apache logs a few dozen lines per second. (Static files are in CDN). PostreSQL only logs slow queries. That’s one or two entries per minute when the load is heavy. Not per second, per minute. This is the text log. The database and the binary log are not on that drive.

The second busiest log is the auth log thanks to our friends in certain countries who think they have an invitation over ssh but they forgot the password and they don’t want to disturb me by asking it. They try to figure it out. (There is a firewall provided by Google, and the password auth was turned off by default. I still might put on sshguard.)

Would you believe that a few dozen entries per second will make the server perform 4 requests per seconds instead of 100+ requests per second, on the drive size that was somehow official, and it’s also the second largest of every image?

Nothing of the web or the database was on the drive. It was only the logs.

Since I experienced this, I have no idea why on earth Google recommends using several drives instead of one or two. I moved the logs. Performance issues were gone. I have been watching it for days. It was really the logs.

And it’s only the FreeBSD image which is 21GB. The Debian and the Red Hat are 10GB. Suse root is 8GB. Okay, no one knows what happens when fsync is called on ext4, maybe it’s okay for those images, but as far as I know, fsync usually makes ZFS try to write data onto the disks.

I could have moved only the busy logs. I prefer FreeBSD over Linux because the main feature isn’t called chaos. Logs should be together. I was just not sure what would happen if some file gets opened after the file systems from the root pool get mounted but before the other file systems from the other pools get mounted.

After I moved the logs from the 21GB drive to a 200GB drive, the load went down from 4-7 to 1-1.5. Load itself tells little though. The page load with browser went from down from 4-21+ seconds to less than 1 second.

I moved to Cloud Compute mainly for the transparent encryption and the flexibility. The latter means I don’t have to call anyone if I need a new server, and I don’t have to wait for days.

As for the encryption, I opened a thread earlier here. I asked whether anyone uses GELI in production. I didn’t feel safe about it.

A few days ago I put GELI on my "old" bare metal servers (two pools on the same disk, the root is not encrypted so I can ssh in and mount the other pool). It’s working fine. It’s cheaper and it performs better. It would be still cheaper to rent one or two more bare metal servers I don’t need at all. It would provide me with the flexibility if the need comes quickly.

It’s not bashing Google. I like them as partner in many ways. I don’t use most of the features of the cloud but they can be good. And the Cloud comes with other ways of flexibility that might make me stay.

It’s just is hits the business hard for a few weeks while you are figuring out you shouldn’t listen to the official manual at all.

Benchmarks didn’t show the issue with the logs.

There was no reason to use send | receive. I’m talking about 1.4GB logs in total. (Mostly in .xz). That’s almost everything since 2016 November. Most of the lines in my newsyslogd.conf files have 120 as "count" and X as "compress".

Moving /var/log to another zfs pool

Mage

ShelLuser

Mage

ShelLuser

Mage

ShelLuser

Mage