ZFS kernel syncer accesses disk every ~2 seconds

Harry Stone · Jul 20, 2020

Does anyone know why this would happen? I'm not sure how long this has been happening, I've recently moved this server next to my desk and now I hear the disk activity all the time. This is on FreeBSD 12.1-RELEASE-p7 GENERIC amd64.

This is output from vfssnoop, every couple seconds I see the syncer access filesystems:

Code:

79190847400401        0     31 syncer           vop_fsync           - /var/log/<unknown>
79191847755431        0     31 syncer           vop_fsync           - /var/crash/<unknown>
79192848082406        0     31 syncer           vop_fsync           - /zdata/public/<unknown>
79193848465263        0     31 syncer           vop_fsync           - /zdata/tmp/<unknown>
79194848807380        0     31 syncer           vop_fsync           - /zroot/<unknown>
79195849166747        0     31 syncer           vop_fsync           - /dev/fd/<unknown>
79198850284597        0     31 syncer           vop_fsync           - /usr/home/<unknown>
79200850955339        0     31 syncer           vop_fsync           - /zdata/stuff/<unknown>
79202851654069        0     31 syncer           vop_fsync           - /usr/src/<unknown>
79203851960027        0     31 syncer           vop_fsync           - /<unknown>
79206893765289        0     31 syncer           vop_fsync           - /zdata/log/<unknown>

Mjölnir · Jul 21, 2020

sysctl kern.{meta,dir,file}delay? But that would be every 1/2 minute. What is vfssnoop?

Harry Stone · Jul 21, 2020

vfssnoop is a dtrace script. This is even weirder, I was on the wrong server because I couldn't believe that there is disk activity that dtrace can't see. But there is. Every couple seconds or so I hear disk activity and dtrace can't see it. Halting FreeBSD makes it stop. Crazy.

I tried modifying kern.meta/dir/file but no change. I thought maybe changing idle/standby via camcontrol. Nope. 4 drives are brand new WD purple 8tb, zroot is 2 WD blue. Load cycle count is fine on all drives. Crazy.

Harry Stone · Jul 22, 2020

After a whole lot of hand wringing, this is likely just the 5 second zfs txg timeout.

PMc · Jul 22, 2020

Harry Stone said:
After a whole lot of hand wringing, this is likely just the 5 second zfs txg timeout.

Doubtful. That would -per default- happen every 5 seconds, not every 2 seconds (unless it got configured otherwise). And that will only happen while there is some activity in the pool.

Harry Stone · Jul 22, 2020

I agree, but I finally timed it and it is 5 seconds. I don't know why it's happening with no activity though

PMc · Jul 22, 2020

Harry Stone said:
I agree, but I finally timed it and it is 5 seconds. I don't know why it's happening with no activity though

Well that's to be figured out. What I can say is: all my pools, except the one with the base-OS, do disk spindown: when the pool is really idle, there is reliably no activity.
But then, only scanning a directory will -per default- already increase the atime, and that will push out some 256kB at the next txg timeout.

Jose · Jul 22, 2020

Periodic writes to a log file?

Harry Stone · Jul 22, 2020

dtrace and zpool iostat show no activity but I hear the drives run for a fraction of a second. It's a mystery. The drives are new in one pool (not the root pool) and a sata backplane but otherwise the hardware is a known entity.

Does zfs txg normally show up in 'zpool iostat' output?

Harry Stone · Jul 22, 2020

Jose said:
Periodic writes to a log file?

It sure acts like it but none that I can find.

PMc · Jul 22, 2020

Harry Stone said:
Does zfs txg normally show up in 'zpool iostat' output?

On 11.3, yes.

t1066 · Jul 23, 2020

Maybe it is writing metadata.

Mjölnir · Jul 23, 2020

Could also be some periodic task done by the firmware of the disks? Does this happen in single-user mode, too?

Harry Stone · Jul 23, 2020

mjollnir said:
Could also be some periodic task done by the firmware of the disks? Does this happen in single-user mode, too?

That's a good point, I'll test that.

t1066 · Jul 25, 2020

You can use dtrace to find out which program called vop_fsync.

Harry Stone · Jul 25, 2020

t1066 said:
You can use dtrace to find out which program called vop_fsync.

I found a dtrace script that will tell me that, the output is in my original post. The kernel syncer process is calling vop_fsync. I can change the tunables mentioned in the syncer man page with no effect.

-- but --

When I saw that, I was on the wrong server. I'll get to that later. My server with new disks and backplane has disk activity every 5 seconds that is invisible to dtrace. Every *5* seconds there is a tiny bit of disk activity. Drive lights do not light, and dtrace shows nothing. When I halt the box it stops.

PMc · Jul 25, 2020

Harry Stone said:
Every *5* seconds there is a tiny bit of disk activity. Drive lights do not light, and dtrace shows nothing. When I halt the box it stops.

If drive lights don't show it, and gstat -p doesn't show it, I might assume it is internal housekeeping of the disk. I know a couple of brands that do such things - not exactly to your description, but there may be others.
If there are a couple of minutes with no intended activity, I would put a disk to standby (spindown) and see when it does spin up again. This is visible with smartctl:

Code:

# smartctl  -n standby /dev/ada0
smartctl 7.1 2019-12-30 r5022 [FreeBSD 11.4-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in STANDBY mode, exit(2)
# smartctl -n standby /dev/ada1
smartctl 7.1 2019-12-30 r5022 [FreeBSD 11.4-RELEASE-p1 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Device is in ACTIVE or IDLE mode

Harry Stone · Jul 25, 2020

PMc said:
If drive lights don't show it, and gstat -p doesn't show it, I might assume it is internal housekeeping of the disk. I know a couple of brands that do such things - not exactly to your description, but there may be others.

I think you're on to something. If I spin down the drives with camcontrol, the noise stops until the drives spin up again. They have stayed spun down as long as I have cared to wait.

Mjölnir · Jul 25, 2020

Seems it's a periodic firmware task. Consider to write a bug report to the manufacturer. Maybe there's a missing or faulty check for cache_dirty (s/th like that) in the routine that starts the sync.

Harry Stone · Jul 25, 2020

mjollnir said:
Seems it's a periodic firmware task. Consider to write a bug report to the manufacturer. Maybe there's a missing or faulty check for cache_dirty (s/th like that) in the routine that starts the sync.

I think you must be right. Weird, but at least explained.

Thanks everyone!