FreeBSD 9 64bit with ZFS crashes every 3 days

belon_cfy · Feb 23, 2012

Problem solved.
http://forums.freebsd.org/showpost.php?p=181397&postcount=52

-------------------------------------------------------------------------------
Hi.

I'm using FreeBSD 9 64bit with ZFS as a backup server, only NFS service is running, however it keeps crashing every 3 days after.

The server hardware is Intel Core I3-2100, 4GB Ram and 4 X 2TB Seagate HDD. The server was running fine previously with ESXI 5.0 for more than three months without a single crash.

I suspected the problem is due to insufficient memory for arc so I have reduced the arc_max to 2GB only, reserved another 2GB for others.

Code:

vfs.zfs.arc_max="2G"

Below is my memory status captured every one minute. I notice that the server crashed after my free memory fell below 32MB.

Code:

Thu Feb 23 00:28:01 2012 Mem: 3768K Active, 448K Inact, 3730M Wired, 10M Cache, 92M Free
Thu Feb 23 00:28:01 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:29:03 2012 Mem: 5388K Active, 140K Inact, 3744M Wired, 9284K Cache, 78M Free
Thu Feb 23 00:29:03 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:30:03 2012 Mem: 3432K Active, 584K Inact, 3745M Wired, 10M Cache, 77M Free
Thu Feb 23 00:30:03 2012 Swap: 4096M Total, 27M Used, 4068M Free
Thu Feb 23 00:31:06 2012 Mem: 5060K Active, 708K Inact, 3744M Wired, 9276K Cache, 78M Free
Thu Feb 23 00:31:06 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:32:01 2012 Mem: 6376K Active, 584K Inact, 3753M Wired, 8188K Cache, 68M Free
Thu Feb 23 00:32:01 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:33:01 2012 Mem: 3356K Active, 80K Inact, 3754M Wired, 11M Cache, 68M Free
Thu Feb 23 00:33:01 2012 Swap: 4096M Total, 28M Used, 4068M Free
Thu Feb 23 00:34:00 2012 Mem: 3692K Active, 712K Inact, 3739M Wired, 10M Cache, 83M Free
Thu Feb 23 00:34:00 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:35:01 2012 Mem: 4140K Active, 216K Inact, 3767M Wired, 10M Cache, 55M Free
Thu Feb 23 00:35:01 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:36:01 2012 Mem: 2312K Active, 156K Inact, 3768M Wired, 7180K Cache, 59M Free
Thu Feb 23 00:36:01 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:37:00 2012 Mem: 3096K Active, 72K Inact, 3749M Wired, 7484K Cache, 77M Free
Thu Feb 23 00:37:00 2012 Swap: 4096M Total, 28M Used, 4068M Free
Thu Feb 23 00:38:00 2012 Mem: 2816K Active, 508K Inact, 3762M Wired, 6948K Cache, 64M Free
Thu Feb 23 00:38:00 2012 Swap: 4096M Total, 28M Used, 4068M Free
Thu Feb 23 00:39:01 2012 Mem: 4044K Active, 220K Inact, 3750M Wired, 6380K Cache, 76M Free
Thu Feb 23 00:39:01 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:40:04 2012 Mem: 3416K Active, 224K Inact, 3756M Wired, 7004K Cache, 70M Free
Thu Feb 23 00:40:04 2012 Swap: 4096M Total, 28M Used, 4068M Free
Thu Feb 23 00:41:03 2012 Mem: 5084K Active, 332K Inact, 3758M Wired, 5740K Cache, 67M Free
Thu Feb 23 00:41:03 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:42:01 2012 Mem: 3904K Active, 184K Inact, 3772M Wired, 6724K Cache, 54M Free
Thu Feb 23 00:42:01 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:43:01 2012 Mem: 3696K Active, 140K Inact, 3761M Wired, 6968K Cache, 65M Free
Thu Feb 23 00:43:01 2012 Swap: 4096M Total, 27M Used, 4068M Free
Thu Feb 23 00:44:01 2012 Mem: 4084K Active, 572K Inact, 3777M Wired, 5268K Cache, 49M Free
Thu Feb 23 00:44:01 2012 Swap: 4096M Total, 27M Used, 4068M Free
Thu Feb 23 00:45:05 2012 Mem: 5332K Active, 408K Inact, 3792M Wired, 2368K Cache, 36M Free
Thu Feb 23 00:45:05 2012 Swap: 4096M Total, 27M Used, 4069M Free
Thu Feb 23 00:46:11 2012 Mem: 5108K Active, 692K Inact, 3797M Wired, 1876K Cache, 32M Free
Thu Feb 23 00:46:11 2012 Swap: 4096M Total, 26M Used, 4070M Free

May I know whether increasing kmem_size will help to avoid server down due to memory exhaustion? Below is my current setting.

Code:

vm.kmem_size: 4023611392

I would like to increase it to 6GB and try again.

Any comment?

phoenix · Feb 23, 2012

Are you using dedupe?

What's your pool config? 2x mirror vdevs? 1x raid1 vdev? Any cache vdevs?

Don't mess with kmem settings. kmem_size just shows the current size. kmem_size_max shows the full address space, which will be something like 64GB.

What is running your backups? What other software is running on there?

Something is running out of control and Wiring down all your RAM, leaving nothing for the rest of OS and starving the system leading to a lockup. Most likely is that your ARC is running out of control. Usual culprit is using dedupe without enough RAM.

belon_cfy · Feb 23, 2012

Nope, I'm using compression instead of dedupe none of the volume with dedupe enabled.

Forgot to attach my raid status.

Code:

       NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada3p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada2p2  ONLINE       0     0     0

No software running on the server, this is only a nfs server sharing folder to others for rsync backup only.

Forgot to mention that, the server first crashed at 1:10am, and the second crashed at 00:46, possible any cronjob causing it?

idownes · Feb 23, 2012

I'm seeing ZFS arc_meta_used taking all RAM.

Try logging the ZFS sysctl as I've suggested here http://forums.freebsd.org/showpost.php?p=167623&postcount=12 and check out vfs.zfs.arc_meta_used compared to vfs.zfs.arc_meta_limit

belon_cfy · Feb 23, 2012

Hi idownes,

Seems I'm having the exactly same issue as you. My server is keeping more than 3 TB of small files, periodic 100.chksetuid took my I/O for checking so I disabled it.

By the way, below are my zfs sysctl values, seems the vfs.zfs.arc_meta_used: 2944727520 already exceeds the vfs.zfs.arc_meta_limit: 737467392.

Any idea?

Code:

vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_lsize: 6951424
vfs.zfs.mfu_ghost_metadata_lsize: 2373072896
vfs.zfs.mfu_ghost_size: 2380024320
vfs.zfs.mfu_data_lsize: 4774912
vfs.zfs.mfu_metadata_lsize: 65536
vfs.zfs.mfu_size: 5227520
vfs.zfs.mru_ghost_data_lsize: 0
vfs.zfs.mru_ghost_metadata_lsize: 99926016
vfs.zfs.mru_ghost_size: 99926016
vfs.zfs.mru_data_lsize: 395264
vfs.zfs.mru_metadata_lsize: 2845324288
vfs.zfs.mru_size: 2851828736
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 65536
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.arc_meta_limit: 737467392
vfs.zfs.arc_meta_used: 2944727520
vfs.zfs.arc_min: 368733696
vfs.zfs.arc_max: 2949869568
vfs.zfs.dedup.prefetch: 1
vfs.zfs.mdcomp_disable: 0
vfs.zfs.write_limit_override: 0
vfs.zfs.write_limit_inflated: 12544475136
vfs.zfs.write_limit_max: 522686464
vfs.zfs.write_limit_min: 33554432
vfs.zfs.write_limit_shift: 3
vfs.zfs.no_write_throttle: 0
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.mg_alloc_failures: 8
vfs.zfs.check_hostid: 1
vfs.zfs.recover: 0
vfs.zfs.txg.synctime_ms: 1000
vfs.zfs.txg.timeout: 5
vfs.zfs.scrub_limit: 10
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 0
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 10
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_replay_disable: 0
vfs.zfs.zio.use_uma: 0
vfs.zfs.version.zpl: 5
vfs.zfs.version.spa: 28
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0

phoenix · Feb 23, 2012

It may be periodic(8) scripts running at night scanning all files looking for various things.

One of the things I've had to modify on my ZFS backups servers is /etc/locate.rc, mainly to remove paths from the database (don't need all the files being backed up in my locate db):

Code:

PRUNEPATH="/tmp /usr/tmp /var/tmp /var/db/portsnap /backups /cameras
PRUNEDIRS=".zfs"

I've also modified /etc/periodic.conf to disable a bunch of checks I don't need, mainly dealing with sendmail, named, rwho, setuid, ipfilter. See /etc/defaults/periodic.conf for what can be added to /etc/periodic.conf.

belon_cfy · Feb 23, 2012

If the vfs.zfs.arc_meta_used already exceeds the vfs.zfs.arc_meta_limit, what should I do to cap the arc_meta_used below the arc_meta_limit?

belon_cfy · Feb 23, 2012

By the way, this is my current vfs.zfs.arc_meta values. Why can the meta_used exceed the meta_limit in such a case? Is this due to the bug in zfs or FreeBSD?

Code:

Thu Feb 23 13:01:20 2012 vfs.zfs.arc_meta_limit: 737467392
Thu Feb 23 13:01:20 2012 vfs.zfs.arc_meta_used: 2606797560

belon_cfy · Feb 23, 2012

Forgot to mention, the vfs.zfs.arc_meta_used keep going rapidly when doing disk scrub.

idownes · Feb 23, 2012

I asked this question a few days ago (here and freebsd-fs@) but haven't yet received any answers. I assume that it's a bug.

http://forums.freebsd.org/showthread.php?t=29969

Also, look at these related posts for more information and workarounds

http://forums.freebsd.org/showthread.php?t=29994

belon_cfy · Feb 23, 2012

Seems the arc_meta_limit is a soft limit instead of a hard limit and I believe there is a bug lead to arc_meta_used exceed the limit when scrubbing zfs.

I'm copying a small file to the server and the arc_meta_used never exceeds the limit for more than 10%

Code:

vfs.zfs.arc_meta_limit: 737467392
vfs.zfs.arc_meta_used: 736363616

But when scrubbing, it can be easily exceeded up to 5 times the limit, seems the soft limit didn't take effect in this operation.

DutchDaemon · Feb 23, 2012

@belon_cfy Start formatting your posts correctly now, thanks.

belon_cfy · Feb 26, 2012

DutchDaemon , thanks for reminding.

Another FreeBSD server crashed again yesterday. Seems it was due to the memory exhaustion again but I still can't figure out which process keeps consuming my available memory.

My server specification as below:

Code:

Intel Xeon 5335 Quadcore 
4GB ECC Ram
4 X 2TB HDD with RAID10
no dedupe is enabled, only compression

Code:

CPU:  0.0% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
Mem: 6980K Active, 460K Inact, 3906M Wired, 18M Free

Below are the vfs.zfs values I have logged before it crashed.

Code:

Sat Feb 25 19:42:40 2012 vfs.zfs.l2c_only_size: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.mfu_ghost_data_lsize: 2856960
Sat Feb 25 19:42:40 2012 vfs.zfs.mfu_ghost_metadata_lsize: 88188928
Sat Feb 25 19:42:40 2012 vfs.zfs.mfu_ghost_size: 91045888
Sat Feb 25 19:42:40 2012 vfs.zfs.mfu_data_lsize: 399360
Sat Feb 25 19:42:40 2012 vfs.zfs.mfu_metadata_lsize: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.mfu_size: 12640256
Sat Feb 25 19:42:40 2012 vfs.zfs.mru_ghost_data_lsize: 2710016
Sat Feb 25 19:42:40 2012 vfs.zfs.mru_ghost_metadata_lsize: 234015232
Sat Feb 25 19:42:40 2012 vfs.zfs.mru_ghost_size: 236725248
Sat Feb 25 19:42:40 2012 vfs.zfs.mru_data_lsize: 624788480
Sat Feb 25 19:42:40 2012 vfs.zfs.mru_metadata_lsize: 83579392
Sat Feb 25 19:42:40 2012 vfs.zfs.mru_size: 835642880
Sat Feb 25 19:42:40 2012 vfs.zfs.anon_data_lsize: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.anon_metadata_lsize: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.anon_size: 12377088
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_norw: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_feed_again: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_noprefetch: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_feed_min_ms: 200
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_feed_secs: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_headroom: 2
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_write_boost: 8388608
Sat Feb 25 19:42:40 2012 vfs.zfs.l2arc_write_max: 8388608
Sat Feb 25 19:42:40 2012 vfs.zfs.arc_meta_limit: 536870912
Sat Feb 25 19:42:40 2012 vfs.zfs.arc_meta_used: 438924456
Sat Feb 25 19:42:40 2012 vfs.zfs.arc_min: 268435456
Sat Feb 25 19:42:40 2012 vfs.zfs.arc_max: 2147483648
Sat Feb 25 19:42:40 2012 vfs.zfs.dedup.prefetch: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.mdcomp_disable: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.write_limit_override: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.write_limit_inflated: 12818632704
Sat Feb 25 19:42:40 2012 vfs.zfs.write_limit_max: 534109696
Sat Feb 25 19:42:40 2012 vfs.zfs.write_limit_min: 33554432
Sat Feb 25 19:42:40 2012 vfs.zfs.write_limit_shift: 3
Sat Feb 25 19:42:40 2012 vfs.zfs.no_write_throttle: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.zfetch.array_rd_sz: 1048576
Sat Feb 25 19:42:40 2012 vfs.zfs.zfetch.block_cap: 256
Sat Feb 25 19:42:40 2012 vfs.zfs.zfetch.min_sec_reap: 2
Sat Feb 25 19:42:40 2012 vfs.zfs.zfetch.max_streams: 8
Sat Feb 25 19:42:40 2012 vfs.zfs.prefetch_disable: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.mg_alloc_failures: 8
Sat Feb 25 19:42:40 2012 vfs.zfs.check_hostid: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.recover: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.txg.synctime_ms: 1000
Sat Feb 25 19:42:40 2012 vfs.zfs.txg.timeout: 5
Sat Feb 25 19:42:40 2012 vfs.zfs.scrub_limit: 10
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.cache.bshift: 16
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.cache.size: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.cache.max: 16384
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.write_gap_limit: 4096
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.read_gap_limit: 32768
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.aggregation_limit: 131072
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.ramp_rate: 2
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.time_shift: 6
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.min_pending: 4
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.max_pending: 10
Sat Feb 25 19:42:40 2012 vfs.zfs.vdev.bio_flush_disable: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.cache_flush_disable: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.zil_replay_disable: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.zio.use_uma: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.version.zpl: 5
Sat Feb 25 19:42:40 2012 vfs.zfs.version.spa: 28
Sat Feb 25 19:42:40 2012 vfs.zfs.version.acl: 1
Sat Feb 25 19:42:40 2012 vfs.zfs.debug: 0
Sat Feb 25 19:42:40 2012 vfs.zfs.super_owner: 0

belon_cfy · Mar 4, 2012

2 servers crashed again last night after 3 days. Both servers contain a lot of small files and occupied 3TB space per server.

Changing the kmem and kmem_max to 3.5GB doesn't help to keep my servers stable.

I'm trying to change the arc_max to 512MB on the first server and remain the default setting on the other. Will report whether both servers are able to survive for more than a week.

Is it possible caused by the compressed zfs swap volume? Theoretically compression might consumes some memory during swap in/out and it might crashes on memory exhausted server. Correct me if I'm wrong.

kpa · Mar 4, 2012

Leave the kmem* tunables alone on AMD64 unless you have a REALLY good reason to change them, starting with 8.2-RELEASE there's generally no reason to touch them on AMD64. Tuning of vfs.zfs.arc_max is recommended however and can be really helpful to limit the size of ARC to a sensible size.

belon_cfy · Mar 4, 2012

kpa said:
Leave the kmem* tunables alone on AMD64 unless you have a REALLY good reason to change them, starting with 8.2-RELEASE there's generally no reason to touch them on AMD64. Tuning of vfs.zfs.arc_max is recommended however and can be really helpful to limit the size of ARC to a sensible size.

It is what I did before, I changed the arc_max to 2GB on 4GB server and it did make the server run longer but crashed again eventually due to the same reason.

I have changed it to 512MB, will it still utilize my free memory as read cache?

olav · Mar 5, 2012

You sure this is not a hardware related issue? Have you checked logs for disks dropping out because of heavy load?

I don't do any memory tunings for ZFS. It's not necessary anymore.

phoenix · Mar 5, 2012

belon_cfy said:
2 servers crashed again last night after 3 days. Both servers contain a lot of small files and occupied 3TB space per server.

Changing the kmem and kmem_max to 3.5GB doesn't help to keep my servers stable.

I'm trying to change the arc_max to 512MB on the first server and remain the default setting on the other. Will report whether both servers are able to survive for more than a week.

Is it possible caused by the compressed zfs swap volume? Theoretically compression might consumes some memory during swap in/out and it might crashes on memory exhausted server. Correct me if I'm wrong.

Don't use swap on zfs. Bad things happen, as you have noticed. You need RAM for ARC. You need ARC to track pool usage, which includes your swap volume. When you run out of RAM, you send things to swap ... which leads to more ARC usage, which leads to less RAM, which means you need to send more stuff to swap, and the cycle contnues until BOOM.

belon_cfy · Mar 5, 2012

olav said:
You sure this is not a hardware related issue? Have you checked logs for disks dropping out because of heavy load?

I don't do any memory tunings for ZFS. It's not necessary anymore.

I don't think it is related to the hardware issue because I'm having completely the same symptom on 2 different servers(Xeon and Core i3 2100 with 4GB Ram). Both servers crashed almost at the same period.

Both servers are able to survive for more than 3-4 months previously as ESXI host with NexentaStor installed.

belon_cfy · Mar 5, 2012

phoenix said:
Don't use swap on zfs. Bad things happen, as you have noticed. You need RAM for ARC. You need ARC to track pool usage, which includes your swap volume. When you run out of RAM, you send things to swap ... which leads to more ARC usage, which leads to less RAM, which means you need to send more stuff to swap, and the cycle contnues until BOOM.

Hi Phoenix,
Thanks for pointing out. The swap is running on ZFS since the root partition is ZFS too.

I hope to retain the swap volume on zfs since there is no available space for me to create a new swap partition. Will try to disable the primarycache and secondarycache on swap volume and see how is everything going on.

phoenix · Mar 5, 2012

If you need swap space, then find a USB stick somewhere, plug it in, and configure that as your swap space. You really need to remove the swap-on-ZVol if you want your server to stop crashing.

belon_cfy · Mar 8, 2012

The problem is happening again, something is consuming all of my memory and causing heavy swap activity.

I have changed the arc_max to 1G on a 4GB server
Any idea?

Code:

48 processes:  1 running, 47 sleeping
CPU:  0.0% user,  0.0% nice,  3.1% system,  0.4% interrupt, 96.5% idle
Mem: 68K Active, 32K Inact, 3751M Wired, 12M Cache, 73M Free
Swap: 8192M Total, 32M Used, 8160M Free, 1012K Out

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
39788 root       32  20    0 10052K   512K rpcsvc  1   0:05  0.49% nfsd
16722 root        1  20    0 18404K    16K nanslp  1  21:23  0.00% vmstat
 1518 root        1  20    0 35604K    72K nanslp  0   2:12  0.00% zpool
 1444 root        1  20    0 68016K    32K select  1   0:48  0.00% sshd
 1458 root        1  24    0 19964K    16K nanslp  0   0:25  0.00% perl5.12.4
 1436 root        1  20    0 68016K    32K select  1   0:17  0.00% sshd
 1464 root        1  20    0 68016K    48K select  1   0:16  0.00% sshd
 6718 root        1  20    0 68016K    32K select  0   0:14  0.00% sshd
 1414 root        1  20    0 68016K    32K select  2   0:09  0.00% sshd
 1324 root        1  20    0 20384K    32K select  0   0:02  0.00% sendmail
 1334 root        1  20    0 14260K     0K nanslp  1   0:02  0.00% <cron>
 1035 root        1  20    0 14264K    16K select  1   0:01  0.00% rpcbind
 1012 root        1  20    0 12184K    32K select  2   0:01  0.00% syslogd
 1408 root        1  20    0 68016K    32K select  3   0:01  0.00% sshd
 1190 root        1  52    0 14264K    16K rpcsvc  2   0:00  0.00% rpc.lockd
39866 root        1  20    0 16700K   440K CPU0    0   0:00  0.00% top
 1184 root        1  20    0   268M    32K select  3   0:00  0.00% rpc.statd
 1162 root        1  20    0 10052K    16K select  2   0:00  0.00% nfsuserd
 1163 root        1  20    0 10052K    16K select  2   0:00  0.00% nfsuserd
 1161 root        1  20    0 10052K    16K select  1   0:00  0.00% nfsuserd
 1160 root        1  20    0 10052K    16K select  2   0:00  0.00% nfsuserd
 1417 root        1  20    0 17664K    32K ttyin   0   0:00  0.00% csh
 1328 smmsp       1  20    0 20384K     0K pause   0   0:00  0.00% <sendmail>
 1439 root        1  20    0 17664K     0K pause   2   0:00  0.00% <csh>
 1175 root        1  20    0 12180K    16K select  0   0:00  0.00% mountd
39787 root        1  20    0 10052K    16K select  2   0:00  0.00% nfsd
 6732 root        1  20    0 17664K     0K pause   1   0:00  0.00% <csh>
 1454 root        1  20    0 17664K     0K pause   0   0:00  0.00% <csh>
 1474 root        1  20    0 17664K     0K pause   0   0:00  0.00% <csh>
39879 root        1  21    0 19964K     0K nanslp  1   0:00  0.00% <perl5.12.4>
39881 root        1  23    0 19964K     0K nanslp  3   0:00  0.00% <perl5.12.4>
39880 root        1  20    0 19964K     0K nanslp  3   0:00  0.00% <perl5.12.4>
 1411 root        1  24    0 17664K     0K pause   0   0:00  0.00% <csh>

phoenix · Mar 8, 2012

Uhm, 32 MB of swap use is not "heavy swap usage".

peetaur · Mar 8, 2012

Please post output from:

# uname -a
(thought someone else would have asked already)

and

# zfs-stats -a

And BTW, I found this to be very untrue on my system:

kpa said:
Leave the kmem* tunables alone on AMD64 unless you have a REALLY good reason to change them, starting with 8.2-RELEASE there's generally no reason to touch them on AMD64. Tuning of vfs.zfs.arc_max is recommended however and can be really helpful to limit the size of ARC to a sensible size.

With 8.2-RELEASE and STABLE from September 2011, my 48GB RAM zfs on root system wouldn't even use more than a few gigs of memory until I set vm.kmem_size. And another reason is my dual 10Gbps network card wouldn't work without setting it, because it wanted much more memory. Now it eats up the RAM nicely, saving just enough to share "Mem: 8948K Active, 43M Inact, 42G Wired, 23M Cache, 4730M Free".

If you rephrase that to say "starting with 8-STABLE from <fill in a date after September 2011>", then I would need to test it again, but otherwise I am sure that for my system, the above is not always correct. (but I am not so sure that I would make a bet that if I removed the kmem setting that it would use <4GB again, since it is strange and I have not yet performed that particular experiment)

belon_cfy · Mar 9, 2012

Hi peetaur
below is my FreeBSD detail.

Code:

FreeBSD storage20 9.0-RELEASE FreeBSD 9.0-RELEASE amd64

My zfs parameters during that time.

Code:

Fri Mar  9 00:00:41 2012 vfs.zfs.l2c_only_size: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.mfu_ghost_data_lsize: 6148096
Fri Mar  9 00:00:41 2012 vfs.zfs.mfu_ghost_metadata_lsize: 55703552
Fri Mar  9 00:00:41 2012 vfs.zfs.mfu_ghost_size: 61851648
Fri Mar  9 00:00:41 2012 vfs.zfs.mfu_data_lsize: 39424
Fri Mar  9 00:00:41 2012 vfs.zfs.mfu_metadata_lsize: 1097728
Fri Mar  9 00:00:41 2012 vfs.zfs.mfu_size: 10587136
Fri Mar  9 00:00:41 2012 vfs.zfs.mru_ghost_data_lsize: 13315072
Fri Mar  9 00:00:41 2012 vfs.zfs.mru_ghost_metadata_lsize: 75742208
Fri Mar  9 00:00:41 2012 vfs.zfs.mru_ghost_size: 89057280
Fri Mar  9 00:00:41 2012 vfs.zfs.mru_data_lsize: 17732608
Fri Mar  9 00:00:41 2012 vfs.zfs.mru_metadata_lsize: 10712576
Fri Mar  9 00:00:41 2012 vfs.zfs.mru_size: 79090176
Fri Mar  9 00:00:41 2012 vfs.zfs.anon_data_lsize: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.anon_metadata_lsize: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.anon_size: 231936
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_norw: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_feed_again: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_noprefetch: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_feed_min_ms: 200
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_feed_secs: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_headroom: 2
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_write_boost: 8388608
Fri Mar  9 00:00:41 2012 vfs.zfs.l2arc_write_max: 8388608
Fri Mar  9 00:00:41 2012 vfs.zfs.arc_meta_limit: 268435456
Fri Mar  9 00:00:41 2012 vfs.zfs.arc_meta_used: 141534312
Fri Mar  9 00:00:41 2012 vfs.zfs.arc_min: 134217728
Fri Mar  9 00:00:41 2012 vfs.zfs.arc_max: 1073741824
Fri Mar  9 00:00:41 2012 vfs.zfs.dedup.prefetch: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.mdcomp_disable: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.write_limit_override: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.write_limit_inflated: 12544475136
Fri Mar  9 00:00:41 2012 vfs.zfs.write_limit_max: 522686464
Fri Mar  9 00:00:41 2012 vfs.zfs.write_limit_min: 33554432
Fri Mar  9 00:00:41 2012 vfs.zfs.write_limit_shift: 3
Fri Mar  9 00:00:41 2012 vfs.zfs.no_write_throttle: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.zfetch.array_rd_sz: 1048576
Fri Mar  9 00:00:41 2012 vfs.zfs.zfetch.block_cap: 256
Fri Mar  9 00:00:41 2012 vfs.zfs.zfetch.min_sec_reap: 2
Fri Mar  9 00:00:41 2012 vfs.zfs.zfetch.max_streams: 8
Fri Mar  9 00:00:41 2012 vfs.zfs.prefetch_disable: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.mg_alloc_failures: 8
Fri Mar  9 00:00:41 2012 vfs.zfs.check_hostid: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.recover: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.txg.synctime_ms: 1000
Fri Mar  9 00:00:41 2012 vfs.zfs.txg.timeout: 5
Fri Mar  9 00:00:41 2012 vfs.zfs.scrub_limit: 10
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.cache.bshift: 16
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.cache.size: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.cache.max: 16384
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.write_gap_limit: 4096
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.read_gap_limit: 32768
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.aggregation_limit: 131072
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.ramp_rate: 2
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.time_shift: 6
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.min_pending: 4
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.max_pending: 10
Fri Mar  9 00:00:41 2012 vfs.zfs.vdev.bio_flush_disable: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.cache_flush_disable: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.zil_replay_disable: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.zio.use_uma: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.version.zpl: 5
Fri Mar  9 00:00:41 2012 vfs.zfs.version.spa: 28
Fri Mar  9 00:00:41 2012 vfs.zfs.version.acl: 1
Fri Mar  9 00:00:41 2012 vfs.zfs.debug: 0
Fri Mar  9 00:00:41 2012 vfs.zfs.super_owner: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.hits: 112836151
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.misses: 42194232
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.demand_data_hits: 8223782
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.demand_data_misses: 2595054
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.demand_metadata_hits: 99086549
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.demand_metadata_misses: 19695660
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.prefetch_data_hits: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.prefetch_data_misses: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 5525820
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 19903518
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.mru_hits: 33185334
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.mru_ghost_hits: 992072
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.mfu_hits: 74389339
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.mfu_ghost_hits: 9275805
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.allocated: 71980380
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.deleted: 47758856
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.stolen: 34276801
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.recycle_miss: 14345121
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.mutex_miss: 31907
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.evict_skip: 203612401
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.evict_l2_cached: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.evict_l2_eligible: 549125139456
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.evict_l2_ineligible: 326250878976
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.hash_elements: 16239
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.hash_elements_max: 166243
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.hash_collisions: 31543544
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.hash_chains: 2864
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.hash_chain_max: 34
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.p: 85759638
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.c: 159243058
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.c_min: 134217728
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.c_max: 1073741824
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.size: 159377760
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.hdr_size: 4036680
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.data_size: 89853440
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.other_size: 65487640
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_hits: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_misses: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_feeds: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_rw_clash: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_read_bytes: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_bytes: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_writes_sent: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_writes_done: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_writes_error: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_evict_reading: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_free_on_write: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_cksum_bad: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_io_error: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_size: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_hdr_size: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.memory_throttle_count: 3411
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_in_l2: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 19944493
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_full: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_pios: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
Fri Mar  9 00:00:41 2012 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0

FreeBSD 9 64bit with ZFS crashes every 3 days

Administrator