Processes starts to crash after all buffers synced (after reboot)

I have a problem with one of my machines at home. It runs FreeBSD 14, with a few ZFS pools.

The machines boots from a single drive zroot pool (configured at install time using bsdinstall) and then a few other pools for data and fast scratch storage. When I issue reboot or shutdown -r now it looks fine at first, but then after all buffers are synced I start getting alot of errors:
Code:
Syncing disks, vnodes remaining... 0 0 0 0 0 00 done
All buffers synced.

pid 41562 (sshd), jid 0, uid 0: exited on signal 11 (no core dunmp - bad address)

I suspect this has something to do with my zfs configuration but I'm not sure. Any clues?
 

Attachments

  • cash.jpg
    cash.jpg
    1.2 MB · Views: 67
Another strange thing: All those daemons are crashing due to signal 11 = SIGSEGV = segmentation fault. How can shutdown cause all processes to get a segmentation fault?
 
Hm. The numbers for syncing disks are all zeros. I wonder whether that is related.
Syncing vnodes zeros is normal for ZFS-only systems. You will see numbers of vnode on any system with a UFS fs mounted.

I suspect what is happening is the filesystems (mountpoints) are unmounted before daemons or other processes close, resulting in attempts to write to buffers that no longer exist resulting in segmentation violations.

Shutdown scripts should shut down daemons. Failing that the kernel sends a signal 15 (TERM) to all processes and failing that the kernel will send a signal 9 (KILL) to all processes.

Was this 14.0-RELEASE or 14-STABLE? If release, was it built from sources or updated using freebsd-update? If 14-STABLE or 14.0-RELEASE built from sources, clean out your /usr/obj and build again as there is probably some corrupt file in it.

If installed using freebsd-update, open a bugzilla bug. Either the binaries are corrupt or there is a legitimate bug.

Having said all this, I run 15-CURRENT at home and 14.0-RELEASE (updated using freebsd-update) at $JOB. I have not experienced this problem anywhere, suggesting this may be a local corruption problem or some other local problem.
 
Was this 14.0-RELEASE or 14-STABLE? If release, was it built from sources or updated using freebsd-update? If 14-STABLE or 14.0-RELEASE built from sources, clean out your /usr/obj and build again as there is probably some corrupt file in it.

If installed using freebsd-update, open a bugzilla bug. Either the binaries are corrupt or there is a legitimate bug.

This is FreeBSD hyper 14.0-RELEASE FreeBSD 14.0-RELEASE #0 releng/14.0-n265380-f9716eee8ab4: Fri Nov 10 05:57:23 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 installed using freebsd-update.

Yeah, I'm guessing this is some sort of local problem for me, but I cant figure out what is could be. I will try later today to first go to single user mode and then reboot to see if that changes the behaviour.
 
I am also seeing this on a 14-RELEASE box with ZFS root, and a 14-STABLE ae8387cc818a0d6a2229ee049b671482e1549519. The -STABLE machine uses ZFS, but not for /. I do not see this on all UFS systems. I've opened PR 275336
 
Yeah, I'm guessing this is some sort of local problem for me, but I cant figure out what is could be. I will try later today to first go to single user mode and then reboot to see if that changes the behaviour.
Similar thing happens to me with few buffers not syncing and I only have ZFS mounted as storage disks and not the boot disk. The boot disk/part/label is UFS2. I just upgraded though (after the ZFS bug was discovered) so I'll try it again in a day or two.
 
amigan in Bugzilla, please add a link to this topic as <https://forums.freebsd.org/threads/91143/>.

… single drive zroot pool …

OK

other pools for data and fast scratch …
How, exactly, are those disks connected?

Also:
  • grep -v \# /etc/fstab | sort
  • zpool list -v
  • geom disk list
  • mount | sort
  • zfs get canmount,mountpoint | grep -v \@ | sort

Each result as a separate code block, please.

… ZFS mounted as storage disks and not the boot disk …

The same questions.
 
How, exactly, are those disks connected?
Code:
 # grep -v \# /etc/fstab | sort
/dev/ada0p3        none    swap    sw        0    0
/dev/nvd0p3        none    swap    sw        0    0
Code:
# zpool list -v
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hot          262G   152G   110G        -         -    61%    57%  1.00x    ONLINE  -
  nda0p7     264G   152G   110G        -         -    61%  58.0%      -    ONLINE
store       10.9T  8.60T  2.27T        -         -    41%    79%  1.00x    ONLINE  -
  raidz1-0  10.9T  8.60T  2.27T        -         -    41%  79.1%      -    ONLINE
    da2     3.64T      -      -        -         -      -      -      -    ONLINE
    da1     3.64T      -      -        -         -      -      -      -    ONLINE
    da0     3.64T      -      -        -         -      -      -      -    ONLINE
logs            -      -      -        -         -      -      -      -         -
  mirror-1  49.5G  3.50M  49.5G        -         -     0%  0.00%      -    ONLINE
    nda0p6    50G      -      -        -         -      -      -      -    ONLINE
    ada0p2    50G      -      -        -         -      -      -      -    ONLINE
cache           -      -      -        -         -      -      -      -         -
  nda0p5      50G  41.3G  8.74G        -         -     0%  82.5%      -    ONLINE
  ada0p1      50G  41.5G  8.50G        -         -     0%  83.0%      -    ONLINE
warm         348G  27.7G   320G        -         -    47%     7%  1.00x    ONLINE  -
  ada0p4     350G  27.7G   320G        -         -    47%  7.97%      -    ONLINE
zroot       85.5G  52.4G  33.1G        -         -    26%    61%  1.00x    ONLINE  -
  nda0p4    85.6G  52.4G  33.1G        -         -    26%  61.2%      -    ONLINE
Code:
# geom disk list
Geom name: nda0
Providers:
1. Name: nda0
   Mediasize: 500107862016 (466G)
   Sectorsize: 512
   Mode: r5w5e9
   descr: Samsung SSD 970 EVO 500GB
   lunid: 0025385901443f34
   ident: S5H7NS1N900255T
   rotationrate: 0
   fwsectors: 0
   fwheads: 0

Geom name: ada0
Providers:
1. Name: ada0
   Mediasize: 500107862016 (466G)
   Sectorsize: 512
   Mode: r4w4e7
   descr: Samsung SSD 860 EVO 500GB
   lunid: 5002538e9081380d
   ident: S4XBNF0N819189T
   rotationrate: 0
   fwsectors: 63
   fwheads: 16

Geom name: da2
Providers:
1. Name: da2
   Mediasize: 4000787030016 (3.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   descr: ST4000VN 008-2DR166
   lunname: ST4000VN008-2DR166      SC60
   lunid: ST4000VN008-2DR166      SC60
   ident: 0000000000000003
   rotationrate: unknown
   fwsectors: 63
   fwheads: 255

Geom name: da3
Providers:
1. Name: da3
   Mediasize: 4000787030016 (3.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   descr: ST4000VN 008-2DR166
   lunname: ST4000VN008-2DR166      SC60
   lunid: ST4000VN008-2DR166      SC60
   ident: 0000000000000004
   rotationrate: unknown
   fwsectors: 63
   fwheads: 255

Geom name: da0
Providers:
1. Name: da0
   Mediasize: 4000787030016 (3.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   descr: ST4000NE 001-2MA101
   lunname: ST4000NE001-2MA101      EN01
   lunid: ST4000NE001-2MA101      EN01
   ident: 0000000000000001
   rotationrate: unknown
   fwsectors: 63
   fwheads: 255

Geom name: da1
Providers:
1. Name: da1
   Mediasize: 4000787030016 (3.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   descr: ST4000NE 001-2MA101
   lunname: ST4000NE001-2MA101      EN01
   lunid: ST4000NE001-2MA101      EN01
   ident: 0000000000000002
   rotationrate: unknown
   fwsectors: 63
   fwheads: 255
 
Thanks, I'm particularly interested in the exact nature of connections to the four da(4) devices.

(I don't know code names well enough to guess whether there's USB in the mix, and so on.)

<https://man.freebsd.org/cgi/man.cgi?query=da&sektion=4&manpath=freebsd-release>
da[0-3] are sitting in a external jbod chassi connected with a usb-c cable.
Code:
ugen1.5: <VIA Labs,Inc. USB3.1 SATA Bridge> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (2mA)
ugen1.6: <VIA Labs,Inc. USB3.1 SATA Bridge> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (2mA)
ugen1.3: <VIA Labs,Inc. USB3.1 SATA Bridge> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (2mA)
ugen1.4: <VIA Labs,Inc. USB3.1 SATA Bridge> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (2mA)
 
The same questions.
Code:
/dev/acd0                       /cdrom          cd9660  ro,noauto               0       0
/dev/ada2s1a            /                       ufs             rw                              1       1
/dev/ada2s1b            none            swap    sw                              0       0
/dev/ada2s1d            /var            ufs             rw                              2       2
/dev/ada2s1e            /tmp            ufs             rw                              2       2
/dev/ada2s1f            /usr            ufs             rw,acls                 2       2
/dev/ada2s1g            /fstore1        ufs             rw                              2       2
fdesc                           /dev/fd         fdescfs rw                              0   0
proc                            /proc           procfs  rw                              0   0

Code:
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zstore3     928G   587G   341G        -         -    21%    63%  1.00x    ONLINE  -
  ada3      932G   587G   341G        -         -    21%  63.3%      -    ONLINE
zstore4    1.81T   794G  1.04T        -         -     4%    42%  1.00x    ONLINE  -
  ada0     1.82T   794G  1.04T        -         -     4%  42.8%      -    ONLINE
zstore5     928G   167G   761G        -         -     0%    17%  1.00x    ONLINE  -
  ada1      932G   167G   761G        -         -     0%  17.9%      -    ONLINE
Code:
Geom name: ada3
Providers:
1. Name: ada3
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   descr: WDC WD10EZEX-00RKKA0
   lunid: 50014ee6adff99a6
   ident: WD-WMC1S4306873
   rotationrate: unknown
   fwsectors: 63
   fwheads: 16

Geom name: ada2
Providers:
1. Name: ada2
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r6w6e17
   descr: WDC WD10EALX-009BA0
   lunid: 50014ee2057f8ea5
   ident: WD-WCATR5587936
   rotationrate: unknown
   fwsectors: 63
   fwheads: 16

Geom name: ada1
Providers:
1. Name: ada1
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e1
   descr: Hitachi HDE721010SLA330
   lunid: 5000cca35ef0842b
   ident: STR607MS3ERLKS
   rotationrate: 7200
   fwsectors: 63
   fwheads: 16

Geom name: ada0
Providers:
1. Name: ada0
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   descr: ST2000DM008-2FR102
   lunid: 5000c500d43c8df6
   ident: WFL4EQXA
   rotationrate: 7200
   fwsectors: 63
   fwheads: 16
Code:
/dev/ada2s1a on / (ufs, NFS exported, local, soft-updates, journaled soft-updates)
/dev/ada2s1d on /var (ufs, local, soft-updates, journaled soft-updates)
/dev/ada2s1e on /tmp (ufs, local, soft-updates)
/dev/ada2s1f on /usr (ufs, NFS exported, local, soft-updates, journaled soft-updates, acls)
/dev/ada2s1g on /fstore1 (ufs, NFS exported, local, soft-updates, journaled soft-updates, acls)
/usr/jails/sharedfs on /fstore5/jails/tcloud/sharedfs (nullfs, NFS exported, local, read-only, soft-updates, journaled soft-updates, acls)
/usr/jails/sharedfs on /fstore5/jails/unifi/sharedfs (nullfs, NFS exported, local, read-only, soft-updates, journaled soft-updates, acls)
devfs on /dev (devfs)
devfs on /fstore5/jails/tcloud/dev (devfs)
devfs on /fstore5/jails/unifi/dev (devfs)
fdescfs on /dev/fd (fdescfs)
procfs on /proc (procfs, local)
smbnetfs on /smbmnt (fusefs)
zstore3 on /zstore3 (zfs, local, noatime, nfsv4acls)
zstore3/fstore3 on /fstore3 (zfs, local, noatime, nfsv4acls)
zstore4 on /zstore4 (zfs, local, noatime, nfsv4acls)
zstore4/fstore4 on /fstore4 (zfs, NFS exported, local, noatime, nfsv4acls)
zstore5 on /zstore5 (zfs, local, nfsv4acls)
zstore5/fstore5 on /fstore5 (zfs, local, nfsv4acls)
Code:
NAME             PROPERTY    VALUE       SOURCE
zstore3          canmount    on          default
zstore3          mountpoint  /zstore3    default
zstore3/fstore3  canmount    on          default
zstore3/fstore3  mountpoint  /fstore3    local
zstore4          canmount    on          default
zstore4          mountpoint  /zstore4    default
zstore4/fstore4  canmount    on          default
zstore4/fstore4  mountpoint  /fstore4    local
zstore5          canmount    on          default
zstore5          mountpoint  /zstore5    default
zstore5/fstore5  canmount    on          default
zstore5/fstore5  mountpoint  /fstore5    local
 
I have a problem with one of my machines at home. It runs FreeBSD 14, with a few ZFS pools.

The machines boots from a single drive zroot pool (configured at install time using bsdinstall) and then a few other pools for data and fast scratch storage. When I issue reboot or shutdown -r now it looks fine at first, but then after all buffers are synced I start getting alot of errors:
Code:
Syncing disks, vnodes remaining... 0 0 0 0 0 00 done
All buffers synced.

pid 41562 (sshd), jid 0, uid 0: exited on signal 11 (no core dunmp - bad address)

I suspect this has something to do with my zfs configuration but I'm not sure. Any clues?
From my perspective the main question is why all those processes are still running when shutdown has reached the stage where it sync-s and unmounts filesystems.
The userland should long be stopped by that time.
No wonder the processes are dying when everything is swept from under their feet.
 
From my perspective the main question is why all those processes are still running when shutdown has reached the stage where it sync-s and unmounts filesystems.
The userland should long be stopped by that time.
No wonder the processes are dying when everything is swept from under their feet.
I agree. I tried to first go to single user mode and then reboot Had the same behaviour.
 
I'm having this problem in my environment too, but if I run the poweroff instead of the reboot, this problem does not seem to occur. Is there a different cleanup process going on between reboot and poweroff?
 
Back
Top