Solved Kernel panic zio.c, line: 270 FreeBSD 10.2 (or 10.3)

Demis

New Member


Messages: 14

#1
Any one knows about problem?
Server: SuperMicro Model:SYS-6026T-6RF+, MB:X8DTU-6F+, RAM 24Гб DDR3, two XEON
RAM: KVR1333D3E9S/4G - DDR3, 1333MHz, ECC, CL9, X8, 1.5V, Unbuffered, DIMM
Version: uname -a
Code:
FreeBSD teo.some.loc 10.2-RELEASE-p12 FreeBSD 10.2-RELEASE-p12 #0: Sat Feb 13 18:04:04 MSK 2016  demis@teo.some.loc:/usr/obj/usr/src/sys/TEO  amd64
(on GENERIC or custom kernel config persist too !!!)
Code:
zpool status hdd
  pool: hdd
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
  still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
  the pool may no longer be accessible by software that does not support
  the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 14h57m with 0 errors on Thu Feb 11 03:35:43 2016
config:

  NAME  STATE  READ WRITE CKSUM
  hdd  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  mfid1p1  ONLINE  0  0  0
  mfid2p1  ONLINE  0  0  0
  mfid3p1  ONLINE  0  0  0
  mfid4p1  ONLINE  0  0  0
  mfid5p1  ONLINE  0  0  0

errors: No known data errors
hdd - is My zfs volume.
When I run command like:
rm /hdd/usr/some/path/to/file
or
rm /hdd/usr/some/path/to/folder
or
chown root:wheel /hdd/usr/some/path/to/file
or
chown root:wheel /hdd/usr/some/path/to/folder
or
setfacl ... to /hdd/usr/some/path/to/file

I'm get kernel panic:
Code:
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: c < (1ULL << 24) >> 9 (0x7fffffffffffff < 0x8000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 270
cpuid = 9
KDB: stack backtrace:
#0 0xffffffff80984ef0 at kdb_backtrace+0x60
#1 0xffffffff80948aa6 at vpanic+0x126
#2 0xffffffff80948973 at panic+0x43
#3 0xffffffff81c0222f at assfail3+0x2f
#4 0xffffffff81aa9d40 at zio_buf_alloc+0x50
#5 0xffffffff81a2b9f8 at arc_get_data_buf+0x358
#6 0xffffffff81a2e20a at arc_read+0x1ea
#7 0xffffffff81a3669c at dbuf_read+0x6ac
#8 0xffffffff81a3d8bf at dmu_spill_hold_existing+0xbf
#9 0xffffffff81a70dd7 at sa_attr_op+0x167
#10 0xffffffff81a72ffb at sa_lookup+0x4b
#11 0xffffffff81abc82a at zfs_rmnode+0x2ba
#12 0xffffffff81ada58e at zfs_freebsd_reclaim+0x4e
#13 0xffffffff80e73537 at VOP_RECLAIM_APV+0xa7
#14 0xffffffff809ec5b4 at vgonel+0x1b4
#15 0xffffffff809eca49 at vrecycle+0x59
#16 0xffffffff81ada52d at zfs_freebsd_inactive+0xd
#17 0xffffffff80e73427 at VOP_INACTIVE_APV+0xa7
Uptime: 9m31s
Dumping 1286 out of 24543 MB:..2%..12%..22%..32%..42%..51%..61%..71% (CTRL-C to abort) ..81%..91%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/ipfw.ko.symbols...done.
Loaded symbols for /boot/kernel/ipfw.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219  pcpu.h: No such file or directory.
  in pcpu.h
(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80948702 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80948ae5 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758
#3  0xffffffff80948973 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687
#4  0xffffffff81c0222f in assfail3 (a=<value optimized out>, lv=<value optimized out>, op=<value optimized out>, rv=<value optimized out>,
  f=<value optimized out>, l=<value optimized out>) at /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#5  0xffffffff81aa9d40 in zio_buf_alloc (size=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:270
#6  0xffffffff81a2b9f8 in arc_get_data_buf (buf=<value optimized out>)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2898
#7  0xffffffff81a2e20a in arc_read (pio=0xfffff80011791730, spa=0xfffff80011579000, bp=0xfffffe000aee7980, done=0xffffffff81a3a2d0 <dbuf_read_done>,
  private=0xfffff8002244b000, priority=ZIO_PRIORITY_SYNC_READ, zio_flags=-528866606, arc_flags=0xfffffe06727fb3c4, zb=0xffffffff81a3a2d0)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1551
#8  0xffffffff81a3669c in dbuf_read (db=0xfffff8002244b000, zio=0x0, flags=6)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:573
#9  0xffffffff81a3d8bf in dmu_spill_hold_existing (bonus=0xfffff800223bed20, tag=0x0, dbp=0xfffff800919966b8)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:333
#10 0xffffffff81a70dd7 in sa_attr_op (hdl=0xfffff80091996690, bulk=0xfffffe06727fb528, count=1, data_op=SA_LOOKUP, tx=0x0)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:310
#11 0xffffffff81a72ffb in sa_lookup (hdl=0xfffff80091996690, attr=<value optimized out>, buf=<value optimized out>, buflen=<value optimized out>)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:1441
#12 0xffffffff81abc82a in zfs_rmnode (zp=0xfffff80091993730) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:633
#13 0xffffffff81ada58e in zfs_freebsd_reclaim (ap=<value optimized out>)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6569
#14 0xffffffff80e73537 in VOP_RECLAIM_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:2019
#15 0xffffffff809ec5b4 in vgonel (vp=0xfffff800111733b0) at vnode_if.h:830
#16 0xffffffff809eca49 in vrecycle (vp=0xfffff800111733b0) at /usr/src/sys/kern/vfs_subr.c:2703
#17 0xffffffff81ada52d in zfs_freebsd_inactive (ap=<value optimized out>)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6540
#18 0xffffffff80e73427 in VOP_INACTIVE_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1953
#19 0xffffffff809eb382 in vinactive (vp=0xfffff800111733b0, td=0xfffff800113d8000) at vnode_if.h:807
#20 0xffffffff809eb772 in vputx (vp=0xfffff800111733b0, func=2) at /usr/src/sys/kern/vfs_subr.c:2306
#21 0xffffffff809f401e in kern_rmdirat (td=<value optimized out>, fd=<value optimized out>, path=<value optimized out>, pathseg=<value optimized out>)
  at /usr/src/sys/kern/vfs_syscalls.c:3842
#22 0xffffffff80d4b3e7 in amd64_syscall (td=0xfffff800113d8000, traced=0) at subr_syscall.c:134
#23 0xffffffff80d30acb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#24 0x00000008008914ea in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
If setup FreeBSD 10.3 BETA (on GENERIC or custom kernel config):
Code:
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: c < (1ULL << 24) >> 9 (0x7fffffffffffff < 0x8000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 273
cpuid = 13
KDB: stack backtrace:
#0 0xffffffff8098f000 at kdb_backtrace+0x60
#1 0xffffffff80951d06 at vpanic+0x126
#2 0xffffffff80951bd3 at panic+0x43
#3 0xffffffff81e0022f at assfail3+0x2f
#4 0xffffffff81cacc70 at zio_buf_alloc+0x50
#5 0xffffffff81c2b8f2 at arc_get_data_buf+0x262
#6 0xffffffff81c2b657 at arc_buf_alloc+0xc7
#7 0xffffffff81c2d601 at arc_read+0x1c1
#8 0xffffffff81c36ce9 at dbuf_read+0x6b9
#9 0xffffffff81c3e415 at dmu_spill_hold_existing+0xc5
#10 0xffffffff81c73707 at sa_attr_op+0x167
#11 0xffffffff81c75972 at sa_lookup+0x52
#12 0xffffffff81cbf8da at zfs_rmnode+0x2ba
#13 0xffffffff81cdd75e at zfs_freebsd_reclaim+0x4e
#14 0xffffffff80e81c27 at VOP_RECLAIM_APV+0xa7
#15 0xffffffff809f9581 at vgonel+0x221
#16 0xffffffff809f9a19 at vrecycle+0x59
#17 0xffffffff81cdd6fd at zfs_freebsd_inactive+0xd
Uptime: 11m11s
Dumping 1368 out of 24542 MB:..2%..11%..22%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/aio.ko.symbols...done.
Loaded symbols for /boot/kernel/aio.ko.symbols
Reading symbols from /boot/kernel/ichsmb.ko.symbols...done.
Loaded symbols for /boot/kernel/ichsmb.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/ipmi.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi.ko.symbols
Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/ipfw.ko.symbols...done.
Loaded symbols for /boot/kernel/ipfw.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219  pcpu.h: No such file or directory.
  in pcpu.h
(kgdb) backtrace
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80951962 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff80951d45 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff80951bd3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff81e0022f in assfail3 (a=<value optimized out>, lv=<value optimized out>, op=<value optimized out>, rv=<value optimized out>, f=<value optimized out>,
  l=<value optimized out>) at /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#5  0xffffffff81cacc70 in zio_buf_alloc (size=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:273
#6  0xffffffff81c2b8f2 in arc_get_data_buf (buf=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3880
#7  0xffffffff81c2b657 in arc_buf_alloc (spa=<value optimized out>, size=<value optimized out>, tag=0x0, type=<value optimized out>)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2057
#8  0xffffffff81c2d601 in arc_read (pio=0xfffff8000fad03b0, spa=0xfffff8000f63d000, bp=0xfffffe000e509980, done=0xffffffff81c3aed0 <dbuf_read_done>, private=0xfffff8000fdd6360,
  priority=ZIO_PRIORITY_SYNC_READ, zio_flags=-2117882160, arc_flags=0xfffffe02925483c4, zb=0xfffff8000fdd6360)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4397
#9  0xffffffff81c36ce9 in dbuf_read (db=0xfffff8000fdd6360, zio=0x0, flags=<value optimized out>)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:682
#10 0xffffffff81c3e415 in dmu_spill_hold_existing (bonus=0xfffff8001f312438, tag=0x0, dbp=0xfffff80062d4e7d0)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:333
#11 0xffffffff81c73707 in sa_attr_op (hdl=0xfffff80062d4e770, bulk=0xfffffe0292548528, count=1, data_op=SA_LOOKUP, tx=0x0)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:305
#12 0xffffffff81c75972 in sa_lookup (hdl=0xfffff80062d4e770, attr=<value optimized out>, buf=<value optimized out>, buflen=<value optimized out>)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:1443
#13 0xffffffff81cbf8da in zfs_rmnode (zp=0xfffff80062d4c8a0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:633
#14 0xffffffff81cdd75e in zfs_freebsd_reclaim (ap=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6619
#15 0xffffffff80e81c27 in VOP_RECLAIM_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:2019
#16 0xffffffff809f9581 in vgonel (vp=0xfffff8000f1beb10) at vnode_if.h:830
#17 0xffffffff809f9a19 in vrecycle (vp=0xfffff8000f1beb10) at /usr/src/sys/kern/vfs_subr.c:2951
#18 0xffffffff81cdd6fd in zfs_freebsd_inactive (ap=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6590
#19 0xffffffff80e81b17 in VOP_INACTIVE_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1953
#20 0xffffffff809f8322 in vinactive (vp=0xfffff8000f1beb10, td=0xfffff8000f9f34b0) at vnode_if.h:807
#21 0xffffffff809f8712 in vputx (vp=0xfffff8000f1beb10, func=2) at /usr/src/sys/kern/vfs_subr.c:2547
#22 0xffffffff80a0137e in kern_rmdirat (td=<value optimized out>, fd=<value optimized out>, path=<value optimized out>, pathseg=<value optimized out>)
  at /usr/src/sys/kern/vfs_syscalls.c:3964
#23 0xffffffff80d574bf in amd64_syscall (td=0xfffff8000f9f34b0, traced=0) at subr_syscall.c:141
#24 0xffffffff80d3c72b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#25 0x000000080089458a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
Crash folder have strange rights:
Code:
d---------+ 3 anna  domain users  3 10 дек 10:32 01-Projcts
d---------+ 2 anna  domain users  2  8 фев 21:46 02-Text
How correct kernel panic?
 

Terry_Kennedy

Aspiring Daemon

Thanks: 263
Messages: 881

#2
How correct kernel panic?
It looks like the panics are due to an assertion failing after a request to allocate a buffer for the ARC (read-ahead cache). Since it is repeatable, it is probably an issue with corrupt on-disk metadata. To rule out memory, you probably want to boot a Memtest86+ ISO and let it run for at least one pass (24GB on that board should be 2-3 hours per pass, I think). Make sure all of the memory is seen and running at the correct speed (I had intermittent flakey problems when a system accidentally got a stick of 16GB 800 memory mixed in with 11 sticks of 8GB 1333 memory).

Once you've eliminated a bad hardware problem...

How big is the pool? Can you copy the pool's contents to some other medium (eSATA HDD, tape, etc.) and easily clobber and re-create the pool? That may be one option, but don't do that until all other possibilities have been exhausted (the first time you find out your backup didn't work is when you try to restore it).

It may be possible to fix the pool, or at least isolate the problem, with zdb(8). However, both zdb(8) and ZFS are pretty much "No user-serviceable parts inside" black boxes to most users. I would suggest posting to freebsd-fs@, which is where you'll find developers (most members here are users, not developers).
 
OP
OP
D

Demis

New Member


Messages: 14

#3
Yes, I'm tested memory two month ago (memtest 86 ver. 4.40) that begin problems. Nearly 4 hours in one pass (in ECC mode). It runs about 5 pass. No errors found.
Every disk checked too (physically - mhdd, logically - zpool scrub, and additional checkit in external company recovery disk). No errors. Every disk 3T (by Hitachi), raid type - raidz2. Part of df -H
Code:
Filesystem                     Size    Used   Avail Capacity  Mounted on
hdd/usr/wf                     6,6T    4,1T    2,5T    62%    /hdd/usr/wf
Can you copy the pool's contents to some other medium
Yes, but I think why server reboot and It does not handle this exception. It may be interesting for developers to understand?
 

Terry_Kennedy

Aspiring Daemon

Thanks: 263
Messages: 881

#4
Yes, but I think why server reboot and It does not handle this exception. It may be interesting for developers to understand?
As I mentioned, if you ask over on freebsd-fs@, a developer may be able to provide some advice.

Normally, a panic happens when the system feels it is in a non-recoverable situation (and the hope is, it is transient and won't recur after a reboot). Many of the common cases have specific panic strings giving some info about what went wrong. However, there are lots of places where the kernel can encounter unexpected data and give a more generic description. One of the most common places for this to happen is when reading metadata from filesystems, since that metadata is generally used to tell the kernel where to find some other piece of metadata, and filesystem corruptions can lead to the kernel "going off into the weeds", either with a Trap 12 out-of-range memory reference, or by triggering an assertion (as your panic does).

An assertion is where the kernel knows that a particular thing can only be in a specific range of states, or where A must be larger than B, or so forth. If that assertion is not true, then something has gone horribly wrong before the routine that triggers the assertion. The assertion doesn't know where that happened, just that there is a problem now. Without the assertion, further processing could lead to a Trap 12, additional filesystem corruption, and so forth, so the least-bad thing to do is to panic.

My suggestion (as a last resort) to copy the pool to some other media and then re-create the pool and restore the data is because the assertion is not the cause, it is a symptom of something that happened earlier.

Unless a particular assertion panic is affecting a number of systems, the developers would probably be more interested in finding out how the data was corrupted in the first place, to prevent the corruption from happening at all. Sometimes the cause is something that can't be prevented (power failure during a series of writes), but sometimes it can be tracked down and fixed. If you post to freebsd-fs@, you might want to mention anything unusual that happened before the first time that the system crashed from this assertion.
 
OP
OP
D

Demis

New Member


Messages: 14

#5
As I mentioned, if you ask over on freebsd-fs@, a developer may be able to provide some advice.

Normally, a panic happens when the system feels it is in a non-recoverable situation (and the hope is, it is transient and won't recur after a reboot). Many of the common cases have specific panic strings giving some info about what went wrong. However, there are lots of places where the kernel can encounter unexpected data and give a more generic description. One of the most common places for this to happen is when reading metadata from filesystems, since that metadata is generally used to tell the kernel where to find some other piece of metadata, and filesystem corruptions can lead to the kernel "going off into the weeds", either with a Trap 12 out-of-range memory reference, or by triggering an assertion (as your panic does).

An assertion is where the kernel knows that a particular thing can only be in a specific range of states, or where A must be larger than B, or so forth. If that assertion is not true, then something has gone horribly wrong before the routine that triggers the assertion. The assertion doesn't know where that happened, just that there is a problem now. Without the assertion, further processing could lead to a Trap 12, additional filesystem corruption, and so forth, so the least-bad thing to do is to panic.

My suggestion (as a last resort) to copy the pool to some other media and then re-create the pool and restore the data is because the assertion is not the cause, it is a symptom of something that happened earlier.

Unless a particular assertion panic is affecting a number of systems, the developers would probably be more interesting in finding out how the data was corrupted in the first place, to prevent the corruption from happening at all. Sometimes the cause is something that can't be prevented (power failure during a series of writes), but sometimes it can be tracked down and fixed. If you post to freebsd-fs@, you might want to mention anything unusual that happened before the first time that the system crashed from this assertion.
Thank you Terry for help . I'll try to address them .
 

ab2k

Member

Thanks: 20
Messages: 73

#6
Hi,

seems your pool was not created on 10.2 and even not on 10.3 of FreeBSD, as both of them have a newer ZFS version. You got an invitation to upgrade it in zpool status. Can you post output of zpool history.

Next, design of your pool - it's a little strange that you use 5 disks for a raid-z2, as it's recommended range is 6 disks, not 5. raid-z1 recommended for your range (to save space and your money).

Next - at debug output i see aio and samba - from this moment i bet you have done a ton of tweaking. Post your /etc/sysctl.conf, /boot/loader.conf and finally smb.conf files.

And more, maybe your system reset by a power failure ? I had nearly something that you having at this moment on one of my servers when ups died.

ADDITION: don't UPGRADE the pool.
 
OP
OP
D

Demis

New Member


Messages: 14

#7
Can you post output of zpool history
Sorry, My history too long (16.891.338 bytes) now (but it can be like a filter).
/etc/sysctl.conf
sed '/ *#/d; /^ *$/d' /etc/sysctl.conf
Code:
security.bsd.see_other_uids=0
security.bsd.see_other_gids=0
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=1
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.link.ether.inet.max_age=1200
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.icmp.bmcastecho=0
net.inet.icmp.maskrepl=0
net.inet.tcp.rfc1323=1
net.inet.tcp.rfc3390=1
net.inet.tcp.rfc3042=1
net.inet.tcp.sack.enable=1
net.inet.udp.maxdgram=57344
net.inet.raw.maxdgram=53248
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=1
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.tcp.msl=15000
net.inet.icmp.icmplim=100
net.inet.icmp.bmcastecho=0
net.inet.icmp.maskrepl=0
net.inet.ip.fw.one_pass=0
net.inet6.ip6.v6only=0
net.inet6.ip6.accept_rtadv=0
net.inet6.ip6.auto_linklocal=0
kern.maxfiles=204800
kern.maxfilesperproc=200000
vfs.vmiodirenable=1
dev.igb.0.rx_processing_limit=4096
dev.igb.1.rx_processing_limit=4096
net.graph.maxdgram=8388608
net.graph.recvspace=8388608
net.route.netisr_maxqlen=4096
kern.ipc.nmbclusters=4194304
kern.ipc.maxsockbuf=83886080
net.inet.ip.dummynet.pipe_slot_limit=1000
net.inet.ip.dummynet.io_fast=1
net.inet.ip.fastforwarding=1
net.inet.ip.intr_queue_maxlen=10240
net.inet.tcp.recvspace=262144
net.inet.tcp.sendspace=262144
net.inet.tcp.mssdflt=1452
net.inet.udp.recvspace=65535
net.inet.udp.maxdgram=65535
net.local.stream.recvspace=65535
net.local.stream.sendspace=65535
net.inet.tcp.delayed_ack=0
kern.ipc.somaxconn=8192
net.inet.ip.portrange.randomized=0
net.inet.tcp.nolocaltimewait=1
vfs.zfs.l2arc_noprefetch=0
sed '/ *#/d; /^ *$/d' /boot/loader.conf
Code:
debug.acpi.max_tasks="128"
net.inet6.ip6.rfc6204w3="0"
net.inet6.ip6.v6only="0"
net.inet6.ip6.no_radr="1"
net.inet6.ip6.accept_rtadv="0"
net.inet6.ip6.auto_linklocal="0"
hw.igb.rxd=4096
hw.igb.txd=4096
hw.igb.max_interrupt_rate=32000
net.isr.defaultqlimit=4096
net.link.ifqmaxlen=10240
if_lagg_load="YES"
hw.igb.num_queues=3
aio_load="yes"
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=131072
vfs.ufs.dirhash_maxmem=16777216
ichsmb_load="YES"
ipmi_load="YES"
vfs.zfs.min_auto_ashift=12
vm.kmem_size="7920M"
vm.kmem_size_max="7920M"
vfs.zfs.arc_max="960M"
vfs.zfs.vdev.cache.size="120M"
If I'm load in single mode (like for fsck -fy) samba is not started, but problems exists.
maybe your system reset by a power failure
Maybe, maybe... But UPS work correctly...
ADDITION: don't UPGRADE the pool.
I know...

seems your pool was not created on 10.2
Of course. Poole began to live three years ago (on FreeBSD 8.2, upgrade to FreeBSD 9.1, upgrade to FreeBSD 9.2, upgrade to 10.2).
Last meal was forced to upgrade , because failure in December ( as described above problem me ) resulted not boot at all. It has been described somewhere here in the forum . One solution - to upgrade to 10.2
 
Last edited by a moderator:

ab2k

Member

Thanks: 20
Messages: 73

#8
Hi, as i supposed... well, let's try to fix your system and we will start from commenting out your tuning things:

1. you have a plenty of memory - whats the purpose of not letting FreeBSD to manage it itself? comment it out:

/boot/loader.conf
Code:
vm.kmem_size="7920M"
vm.kmem_size_max="7920M"
vfs.zfs.arc_max="960M"
vfs.zfs.vdev.cache.size="120M"
debug.acpi.max_tasks="128" # this parameter on default will be higher than you are setting it.
/etc/sysctl.conf (FreeBSD itself will set much more than you are setting - so comment it out)
Code:
kern.maxfiles=204800
kern.maxfilesperproc=200000
2. you are using ZFS whats the purpose of this ? comment it out.

/boot/loader.conf
Code:
vfs.ufs.dirhash_maxmem=16777216
3. sometimes aio is better to turn off - so comment it out:

/boot/loader.conf
Code:
aio_load="yes"
4. What disks are you using ? advanced format drives (4096 bytes/sector) ? you are using ashift=12... ZFS pool created with that option ?

5. And last - you have very aged pool. Recommended age is around 3 years, after that time any pool that used hardly everyday becomes 50+% fragmented and it's performance degrade to a turtle speed. Think you have to think and remake it from scratch. Upgrade is not an option for you.

Addition: you have said that zfs history log is too big (16MB) - i bet you have done a ton of snapshots - did you managed it or not (deleted some, or there is a millions of snapshots from a start of the pool)? probably they just don't fit in memory, as you don't letting FreeBSD to allocate enough memory for ZFS.
 
OP
OP
D

Demis

New Member


Messages: 14

#9
In paragraphs 1, 2, 3, I completely exclude these files. Just rename them and restart. The problem still exists. In single player mode too.
What disks are you using
run: mfiutil show drives
Code:
mfi0 Physical Drives:
8 (  466G) ONLINE <Hitachi HUA72205 A3EA serial=JPW9K0J82LXZEL> SATA E1:S4
9 (  466G) ONLINE <Hitachi HUA72205 A3EA serial=JPW9K0J82KY8XL> SATA E1:S5
10 ( 2795G) ONLINE <Hitachi HUA5C303 A800 serial=MJ0331YNG7UD6A> SATA E1:S2
11 ( 2795G) ONLINE <Hitachi HUA5C303 A800 serial=MJ0331YNG7UUUA> SATA E1:S3
12 ( 2795G) ONLINE <Hitachi HUA5C303 A800 serial=MJ0331YNG7U9MA> SATA E1:S1
13 ( 2795G) ONLINE <Hitachi HUA5C303 A800 serial=MJ0331YNG7UTSA> SATA E1:S0
14 ( 2795G) ONLINE <Hitachi HUA5C303 A800 serial=MJ0331YNG7U1MA> SATA E1:S6
run: mfiutil show volumes
Code:
mfi0 Volumes:
  Id  Size  Level  Stripe  State  Cache  Name
mfid0 (  465G) RAID-1  64K OPTIMAL Enabled
mfid1 ( 2794G) RAID-0  64K OPTIMAL Enabled
mfid2 ( 2794G) RAID-0  64K OPTIMAL Enabled
mfid3 ( 2794G) RAID-0  64K OPTIMAL Enabled
mfid4 ( 2794G) RAID-0  64K OPTIMAL Enabled
mfid5 ( 2794G) RAID-0  64K OPTIMAL Enabled
ZFS pool created with that option ?
Yes.
run: gpart list | grep -e mfi -e offset
Code:
Geom name: mfid1
1. Name: mfid1p1
  Stripeoffset: 20480
  offset: 20480
1. Name: mfid1
Geom name: mfid2
1. Name: mfid2p1
  Stripeoffset: 20480
  offset: 20480
1. Name: mfid2
Geom name: mfid3
1. Name: mfid3p1
  Stripeoffset: 20480
  offset: 20480
1. Name: mfid3
Geom name: mfid4
1. Name: mfid4p1
  Stripeoffset: 20480
  offset: 20480
1. Name: mfid4
Geom name: mfid5
1. Name: mfid5p1
  Stripeoffset: 20480
  offset: 20480
run: gpart show
Code:
=>  34  974608317  mfid0  GPT  (465G)
  34  128  1  freebsd-boot  (64K)
  162  4194176  2  freebsd-ufs  (2.0G)
  4194338  33554432  3  freebsd-swap  (16G)
  37748770  117329920  4  freebsd-ufs  (56G)
  155078690  25165824  5  freebsd-ufs  (12G)
  180244514  792723456  6  freebsd-ufs  (378G)
  972967970  1640381  - free -  (801M)

=>  34  5859372989  mfid1  GPT  (2.7T)
  34  6  - free -  (3.0K)
  40  5859372976  1  freebsd-zfs  (2.7T)
  5859373016  7  - free -  (3.5K)

=>  34  5859372989  mfid2  GPT  (2.7T)
  34  6  - free -  (3.0K)
  40  5859372976  1  freebsd-zfs  (2.7T)
  5859373016  7  - free -  (3.5K)

=>  34  5859372989  mfid3  GPT  (2.7T)
  34  6  - free -  (3.0K)
  40  5859372976  1  freebsd-zfs  (2.7T)
  5859373016  7  - free -  (3.5K)

=>  34  5859372989  mfid4  GPT  (2.7T)
  34  6  - free -  (3.0K)
  40  5859372976  1  freebsd-zfs  (2.7T)
  5859373016  7  - free -  (3.5K)

=>  34  5859372989  mfid5  GPT  (2.7T)
  34  6  - free -  (3.0K)
  40  5859372976  1  freebsd-zfs  (2.7T)
  5859373016  7  - free -  (3.5K)
run: gpart status
Code:
  Name  Status  Components
  mfid0p1  OK  mfid0
  mfid0p2  OK  mfid0
  mfid0p3  OK  mfid0
  mfid0p4  OK  mfid0
  mfid0p5  OK  mfid0
  mfid0p6  OK  mfid0
  mfid1p1  OK  mfid1
  mfid2p1  OK  mfid2
  mfid3p1  OK  mfid3
  mfid4p1  OK  mfid4
  mfid5p1  OK  mfid5
Yes. This is done on a regular basis. About once every two or three months. Now there are 190.
run: zfs list -t snapshot | wc -l
Code:
190
I'm not complaining at the server. With this just all right. But there is a problem of panic.
 

kpa

Beastie's Twin

Thanks: 1,695
Messages: 6,103

#10
Make a full backup and recreate your pool from scratch. Irreversible metadata corruption that can not be fixed with scrub can happen, it has happened to me as well. Take note of the advice above about the loader.conf(5) entries you have, you're setting way too many settings that you don't seem to fully understand.
 
OP
OP
D

Demis

New Member


Messages: 14

#11
Of course I can do it, but I worry about that with this error to restart the server. And this should not be. And imagine that the pool size will increase by 10-100-1000 times. So what? No. Panic should not be. Post a yes.
 

ab2k

Member

Thanks: 20
Messages: 73

#12
Sadly, but only one thing is left - i agree with kpa, seems your pool is corrupted somewhere and it needs to be redone by scratch. When you will move your data don't use zfs send | zfs receive commands, just rsync everything from one pool to another, just to be sure you will not get corrupted metadata with snapshot. Also upon creation, make sure that your drives are advanced format drives (4096 bytes/sector). i cannot find their specs on google, dont't know why... you may always use smartctl -a /dev/*YOUR DRIVE* from sysutils/smartmontools to get bytes/sector information.
 

Terry_Kennedy

Aspiring Daemon

Thanks: 263
Messages: 881

#13
Of course I can do it, but I worry about that with this error to restart the server. And this should not be. And imagine that the pool size will increase by 10-100-1000 times. So what? No. Panic should not be. Post a yes.
As I attempted to point out above, your system is panicing because it the code thinks that's the best solution to the problem. Let's assume you modify FreeBSD to remove that assertion check. You still have corrupted metadata in the filesystem. Consider 3 possibilities if the code proceeds:

1) Some operations succeed, some fail. Processes start dying or hanging, potentially without any indication of what is wrong.
2) Some time later the corrupted data is dereferenced by the kernel, leading to a random Trap 12 panic.
3) The corrupted metadata is used during a subsequent write operation to the filesystem, causing yet more metadata corruption.

Remember, RAID is not backup. And neither is ZFS. As mentioned above, sometimes a full backup / re-create / restore cycle is the only thing that will correct the problem. I don't think the issue you're experiencing was caused by a ZFS bug - people have been running ZFS on FreeBSD for a long time and if this was a problem in the code, I think someone else would have encountered it. Since your version is reported as 10.2-RELEASE-p12 it seems you're tracking the 10.2-RELEASE branch, so you shouldn't run into untested code from HEAD or "Oops, I broke it" commits to 10-STABLE.
 
OP
OP
D

Demis

New Member


Messages: 14

#15
Hi, guys,

to find and resolve this problem use:
1. for debug zfs FreeBSD 10.2 (amd64 by xeon) add in yours kernel (or generic) config
Code:
  options DDB
  options KDB_UNATTENDED
  options OPENSOLARIS_WITNESS
other by defaults...
ee /etc/make.conf
Code:
  CFLAGS-=-O2
  CFLAGS+=-O0
  COPTFLAGS-=-O2
  COPTFLAGS+=-O0
  CXXFLAGS-=-O2
  CXXFLAGS+=-O0
  STRIP=
  CFLAGS+=-fno-omit-frame-pointer
  DEBUG_FLAGS+='-O0'
Thanks Andrey Lavrentyev (aka lavr)

And use build like:
Code:
  cd /usr/src
  make -j1 buildkernel
  make installkernel
2. for correct same problems use recommendation by zfs developer Andryi Gapon from maillist:
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=101999+0+archive/2016/freebsd-fs/20160410.freebsd-fs
3. before use p.2 read carrefully https://forums.freebsd.org/threads/51470/ and this forum.

Thanks to All!
DemIS
 
Top