ZFS ZFS hangs while removing large file

Yuriy Tabolin · Mar 28, 2015

Hello All!
I have a server FreeBSD 10.1-RELEASE with some zfs pools and datasets on it.

$ zpool status

Code:

  pool: pool1
state: ONLINE
  scan: resilvered 420K in 0h0m with 0 errors on Fri Mar 27 14:58:21 2015
config:
  NAME  STATE  READ WRITE CKSUM
  pool1  ONLINE  0  0  0
  raidz3-0  ONLINE  0  0  0
  multipath/pd01  ONLINE  0  0  0
  multipath/pd02  ONLINE  0  0  0
  multipath/pd03  ONLINE  0  0  0
  multipath/pd04  ONLINE  0  0  0
  multipath/pd05  ONLINE  0  0  0
  multipath/pd06  ONLINE  0  0  0
  multipath/pd07  ONLINE  0  0  0
  multipath/pd08  ONLINE  0  0  0
multipath/pd09  ONLINE  0  0  0
  multipath/pd10  ONLINE  0  0  0
  multipath/pd11  ONLINE  0  0  0
  multipath/pd12  ONLINE  0  0  0
  multipath/pd13  ONLINE  0  0  0
  logs
  mirror-1  ONLINE  0  0  0
  diskid/DISK-CVWL435200Y1480QGNp1  ONLINE  0  0  0
  diskid/DISK-CVWL4353000F480QGNp1  ONLINE  0  0  0
  cache
  diskid/DISK-CVWL435200Y1480QGNp4  ONLINE  0  0  0
  diskid/DISK-CVWL4353000F480QGNp4  ONLINE  0  0  0
errors: No known data errors
.
.
.

$ zfs list

Code:

NAME  USED  AVAIL  REFER  MOUNTPOINT
pool1  28,7T  6,18T  45,7K  /pool1
pool1/iscsi  15,1T  6,18T  45,7K  /pool1/iscsi
pool1/iscsi/NID  13,8T  6,18T  48,8K  /pool1/iscsi/NID
pool1/iscsi/NID/blade-nid  314G  6,18T  17,6G  /pool1/iscsi/NID/blade-nid
pool1/iscsi/NID/blade-nid/blade01  76,8G  6,18T  117G  /pool1/iscsi/NID/blade-nid/blade01
pool1/iscsi/NID/blade-nid/blade03  219G  6,18T  117G  /pool1/iscsi/NID/blade-nid/blade03
pool1/iscsi/NID/vol01  3,51T  6,18T  3,51T  /pool1/iscsi/NID/vol01
pool1/iscsi/NID/vol02  3,38T  6,18T  3,38T  /pool1/iscsi/NID/vol02
pool1/iscsi/NID/vol03  3,08T  6,18T  3,08T  /pool1/iscsi/NID/vol03
pool1/iscsi/NID/vol04  3,51T  6,18T  3,51T  /pool1/iscsi/NID/vol04
pool1/iscsi/nid  1,32T  6,18T  45,7K  /pool1/iscsi/nid
pool1/iscsi/nid/blade01  1,19T  6,18T  45,7K  /pool1/iscsi/nid/blade01
pool1/iscsi/nid/blade01/boot  45,7K  6,18T  45,7K  /pool1/iscsi/nid/blade01/boot
pool1/iscsi/nid/blade01/data  1,19T  6,18T  1,19T  /pool1/iscsi/nid/blade01/data
pool1/iscsi/nid/blade02  56,1G  6,18T  45,7K  /pool1/iscsi/nid/blade02
pool1/iscsi/nid/blade02/boot  56,1G  6,18T  56,1G  /pool1/iscsi/nid/blade02/boot
pool1/iscsi/nid/blade03  84,8G  6,18T  45,7K  /pool1/iscsi/nid/blade03
pool1/iscsi/nid/blade03/boot  84,8G  6,18T  84,8G  /pool1/iscsi/nid/blade03/boot
pool1/iscsi/nid/subt  229K  6,18T  45,7K  /pool1/iscsi/nid/subt
pool1/iscsi/nid/subt/vol01  45,7K  6,18T  45,7K  /pool1/iscsi/nid/subt/vol01
pool1/iscsi/nid/subt/vol02  45,7K  6,18T  45,7K  /pool1/iscsi/nid/subt/vol02
pool1/iscsi/nid/subt/vol03  45,7K  6,18T  45,7K  /pool1/iscsi/nid/subt/vol03
pool1/iscsi/nid/subt/vol04  45,7K  6,18T  45,7K  /pool1/iscsi/nid/subt/vol04
pool1/samba  13,6T  6,18T  45,7K  /pool1/samba
pool1/samba/NMO  6,37T  643G  6,37T  /pool1/samba/NMO
pool1/samba/Science  7,20T  823G  45,7K  /pool1/samba/Science
pool1/samba/Science/ASR  2,06T 823G  2,02T  /pool1/samba/Science/ASR
pool1/samba/Science/DB  5,14T  823G  45,7K  /pool1/samba/Science/DB
pool1/samba/Science/DB/ASR  5,14T  823G  5,14T  /pool1/samba/Science/DB/ASR
pool1/samba/Science/DB/SIV  45,7K  823G  45,7K  /pool1/samba/Science/DB/SIV
pool1/samba/Science/SIV  45,7K  823G  45,7K  /pool1/samba/Science/SIV
pool1/samba/testsmb  45,7K  6,18T  45,7K  /pool1/samba/testsmb
.
.
.

$ zfs get all pool1/iscsi/NID/blade-nid/blade01

Code:

NAME  PROPERTY  VALUE  SOURCE
pool1/iscsi/NID/blade-nid/blade01  type  filesystem  -
pool1/iscsi/NID/blade-nid/blade01  creation  ВР ДЕЙ 11 15:57 2014  -
pool1/iscsi/NID/blade-nid/blade01  used  76,8G  -
pool1/iscsi/NID/blade-nid/blade01  available  6,18T  -
pool1/iscsi/NID/blade-nid/blade01  referenced  117G  -
pool1/iscsi/NID/blade-nid/blade01  compressratio  1.00x  -
pool1/iscsi/NID/blade-nid/blade01  mounted  yes  -
pool1/iscsi/NID/blade-nid/blade01  origin  pool1/iscsi/NID/blade-nid/blade03@make-clones  -
pool1/iscsi/NID/blade-nid/blade01  quota  none  default
pool1/iscsi/NID/blade-nid/blade01  reservation  none  default
pool1/iscsi/NID/blade-nid/blade01  recordsize  4K  inherited from pool1/iscsi/NID
pool1/iscsi/NID/blade-nid/blade01  mountpoint  /pool1/iscsi/NID/blade-nid/blade01  default
pool1/iscsi/NID/blade-nid/blade01  sharenfs  off  default
pool1/iscsi/NID/blade-nid/blade01  checksum  on  default
pool1/iscsi/NID/blade-nid/blade01  compression  off  default
pool1/iscsi/NID/blade-nid/blade01  atime  off  inherited from pool1
pool1/iscsi/NID/blade-nid/blade01  devices  on  default
pool1/iscsi/NID/blade-nid/blade01  exec  on  default
pool1/iscsi/NID/blade-nid/blade01  setuid  on  default
pool1/iscsi/NID/blade-nid/blade01  readonly  off  default
pool1/iscsi/NID/blade-nid/blade01  jailed  off  default
pool1/iscsi/NID/blade-nid/blade01  snapdir  hidden  default
pool1/iscsi/NID/blade-nid/blade01  aclmode  discard  default
pool1/iscsi/NID/blade-nid/blade01  aclinherit  restricted  default
pool1/iscsi/NID/blade-nid/blade01  canmount  on  default
pool1/iscsi/NID/blade-nid/blade01  xattr  off  temporary
pool1/iscsi/NID/blade-nid/blade01  copies  1  default
pool1/iscsi/NID/blade-nid/blade01  version  5  -
pool1/iscsi/NID/blade-nid/blade01  utf8only  off  -
pool1/iscsi/NID/blade-nid/blade01  normalization  none  -
pool1/iscsi/NID/blade-nid/blade01  casesensitivity  sensitive  -
pool1/iscsi/NID/blade-nid/blade01  vscan  off  default
pool1/iscsi/NID/blade-nid/blade01  nbmand  off  default
pool1/iscsi/NID/blade-nid/blade01  sharesmb  off  default
pool1/iscsi/NID/blade-nid/blade01  refquota  none  default
pool1/iscsi/NID/blade-nid/blade01  refreservation  none  default
pool1/iscsi/NID/blade-nid/blade01  primarycache  all  default
pool1/iscsi/NID/blade-nid/blade01  secondarycache  all  default
pool1/iscsi/NID/blade-nid/blade01  usedbysnapshots  0  -
pool1/iscsi/NID/blade-nid/blade01  usedbydataset  76,8G  -
pool1/iscsi/NID/blade-nid/blade01  usedbychildren  0  -
pool1/iscsi/NID/blade-nid/blade01  usedbyrefreservation  0  -
pool1/iscsi/NID/blade-nid/blade01  logbias  latency  default
pool1/iscsi/NID/blade-nid/blade01  dedup  off  default
pool1/iscsi/NID/blade-nid/blade01  mlslabel  -
pool1/iscsi/NID/blade-nid/blade01  sync  standard  default
pool1/iscsi/NID/blade-nid/blade01  refcompressratio  1.00x  -
pool1/iscsi/NID/blade-nid/blade01  written  76,8G  -
pool1/iscsi/NID/blade-nid/blade01  logicalused  66,3G  -
pool1/iscsi/NID/blade-nid/blade01  logicalreferenced  101G  -
pool1/iscsi/NID/blade-nid/blade01  volmode  default  default
pool1/iscsi/NID/blade-nid/blade01  filesystem_limit  none  default
pool1/iscsi/NID/blade-nid/blade01  snapshot_limit none  default
pool1/iscsi/NID/blade-nid/blade01  filesystem_count  none  default
pool1/iscsi/NID/blade-nid/blade01  snapshot_count  none  default
pool1/iscsi/NID/blade-nid/blade01  redundant_metadata  all  default

When I was removing 1.3 TB file in /pool1/iscsi/NID/blade-nid/blade01 system hangs after 20-30 minutes. There was no one error in console, but I was forced to reset server. After boot I waited ~40 minutes before system was able to mount zfs datasets. At that time HDD of pool1 was blinking. After that file has disappeared.

A week ago same problem was when I removing 100 GB file. I was forced to reset server too.
Now I fear to removing large file on that server.
Thanks for any help!

gkontos · Mar 28, 2015

The numbers are a bit awkward, as if they don't add up:

Code:

NAME USED AVAIL REFER MOUNTPOINT
pool1 28,7T 6,18T 45,7K /pool1

Code:

pool1/iscsi/NID/blade-nid/blade01 used 76,8G -
pool1/iscsi/NID/blade-nid/blade01 available 6,18T -

It also looks like you have exceeded the 80% usage. Are you using a lot of snapshots?

Yuriy Tabolin · Mar 29, 2015

What do you mean? They add up normal. The size of pool is 34.9TB, free 6.18TB. Pool is ~82% full.
There are few snapshots:
$ zfs list -r -t snapshot pool1

Code:

NAME  USED  AVAIL  REFER  MOUNTPOINT
pool1/iscsi/NID/blade-nid/blade03@make-clones  102G  -  116G  -
pool1/samba/Science@to-stor  0  -  45,7K -
pool1/samba/Science@-2015-03-27  0  -  45,7K  -
pool1/samba/Science@-2015-03-28  0  -  45,7K  -
pool1/samba/Science/ASR@to-stor  34,3G  -  2,00T  -
pool1/samba/Science/DB@to-stor  27,4K  -  45,7K  -
pool1/samba/Science/DB/ASR@to-stor  363M  -  5,08T  -
pool1/samba/Science/DB/SIV@to-stor  0  -  45,7K  -
pool1/samba/Science/SIV@to-stor  0  -  45,7K  -

I think that more filling of the pool has a negative impact on performance, but should not lead to system hang.

gkontos · Mar 29, 2015

Yuriy Tabolin said:
What do you mean? They add up normal. The size of pool is 34.9TB, free 6.18TB. Pool is ~82% full..

My mistake, zfs get all does not show relative usage in TB.

Yuriy Tabolin said:

There are few snapshots:

$ zfs list -r -t snapshot pool1

Code:

NAME  USED  AVAIL  REFER  MOUNTPOINT
pool1/iscsi/NID/blade-nid/blade03@make-clones  102G  -  116G  -
pool1/samba/Science@to-stor  0  -  45,7K -
pool1/samba/Science@-2015-03-27  0  -  45,7K  -
pool1/samba/Science@-2015-03-28  0  -  45,7K  -
pool1/samba/Science/ASR@to-stor  34,3G  -  2,00T  -
pool1/samba/Science/DB@to-stor  27,4K  -  45,7K  -
pool1/samba/Science/DB/ASR@to-stor  363M  -  5,08T  -
pool1/samba/Science/DB/SIV@to-stor  0  -  45,7K  -
pool1/samba/Science/SIV@to-stor  0  -  45,7K  -

I think that more filling of the pool has a negative impact on performance, but should not lead to system hang.

Unfortunately, exceeding the 80% usage in ZFS can lead to unexpected results. Can you also post the zpool list? It will show us the fragmentation of the pool.

Yuriy Tabolin · Mar 30, 2015

$ zpool list

Code:

NAME  SIZE  ALLOC  FREE  FRAG  EXPANDSZ  CAP  DEDUP  HEALTH  ALTROOT
pool1  47,2T  37,7T  9,59T  37%  -  79%  1.00x  ONLINE  -

Some times I had a situations, when some pools temporary exceed 90-95% usage. It worked very slowly, but with the system hang I faced for the first time.

da1 · Mar 30, 2015

Unfortunately, exceeding the 80% usage in ZFS can lead to unexpected results.

This was changed/fixed a long time ago to 4%.

https://forums.freebsd.org/threads/...pool-usage-is-80-and-above.41627/#post-231324

da1 · Mar 30, 2015

Yuriy Tabolin said:
When I was removing 1.3TB file in /pool1/iscsi/NID/blade-nid/blade01 system hangs after 20-30 minutes. There was no one error in console, but I was forced to reset server.

You did not specify if you removed it with rm but just in case you did, can you try a

Code:

CTRL+T

to see what the process is doing (a.k.a where is it hanging)?

qcure · Mar 30, 2015

Hi,

I have seen this similar behavior on old version of ZFS pools, like from FreeBSD 9.1 or something like that. While removing large volumes, ZFS would look like it hangs but it actually hasn't hung. We had a istgt service running and in the old version of the pool removing >60G volumes would cause all IO to stop and the istgt daemon would stop servicing requests, and the hosts that use it would report the LUNs as dead. This turned out to be issue with the pool itself, the deletion of large data was not an asynchronous process and after you zfs destroy, it didn't release the prompt until it was finished freeing all the blocks. This all changed with later versions of the pool, so if you are running older version of the pool and you just updated the BSD and imported the pool, I advice you to upgrade the pool. With the new versions FreeBSD would release the prompt after zfs destroy immediately and would free the blocks in the background without affecting any service.

gkontos · Mar 30, 2015

I tend to believe that the OP uses the latest zpool version. I assumed that after seeing the zpool status.

The 80% problem is not limited to ZFS only. Any file system will start showing dramatic performance loss. ZFS is just a bit more sensitive. It is difficult to troubleshoot those problem over a forum. But If I was experiencing those issues, I would start by carefully monitoring the output of # zpool iostat -v 1 and top(). The I would start by slowly removing smaller files and see what the numbers do indicate.

Do not power cycle the system unless you are absolutely sure that it is hang. Sometimes, ZFS processes might look "stack" but you need to be absolute certain.

Yuriy Tabolin · Mar 30, 2015

All pools have the latest zpool version. I think to try next once first create snapshot, then rm file, and last destroy snapshot. May be it helps.

There are the symptoms of system hang: broken pipe on ssh, istgt and nfsd daemons stops answering requests, and no any reaction on keyboard in server console.

gkontos · Mar 30, 2015

Yuriy Tabolin said:
All pools have the latest zpool version. I think to try next once first create snapshot, then rm file, and last destroy snapshot. May be it helps.

There are the symptoms of system hang: broken pipe on ssh, istgt and nfsd daemons stops answering requests, and no any reaction on keyboard in server console.

That sounds like a panic and it is definitely not the result of overfilling the FS. Other than that, does the system behave normally under high load? Do you have anything in your logs that could indicate something hardware related, like a signal 11, etc.

t1066 · Mar 31, 2015

What is the output of zpool get all?

Yuriy Tabolin · Mar 31, 2015

gkontos said:
That sounds like a panic and it is definitely not the result of overfilling the FS. Other than that, does the system behave normally under high load? Do you have anything in your logs that could indicate something hardware related, like a signal 11, etc.

Under high load system works well, I tested it with bonnie, fio, iometer and there was no problem. There is one record in logs after last hang

Code:

sonewconn: pcb 0xfffff8015fd54310: Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences)

But this record appeared 30sec later after nfsd and istgt stopped answering requests. I think that is a consequence, but not a reason.

t1066 said:
What is the output of zpool get all?

Code:

# zpool get all pool1
NAME  PROPERTY  VALUE  SOURCE
pool1  size  47,2T  -
pool1  capacity  79%  -
pool1  altroot  -  default
pool1  health  ONLINE  -
pool1  guid  10458059495056723837  default
pool1  version  -  default
pool1  bootfs  -  default
pool1  delegation  on  default
pool1  autoreplace  off  default
pool1  cachefile  -  default
pool1  failmode  wait  default
pool1  listsnapshots  off  default
pool1  autoexpand  off  default
pool1  dedupditto  0  default
pool1  dedupratio  1.00x  -
pool1  free  9,58T  -
pool1  allocated  37,7T  -
pool1  readonly  off  -
pool1  comment  -  default
pool1  expandsize  0  -
pool1  freeing  0  default
pool1  fragmentation  37%  -
pool1  leaked  0  default
pool1  feature@async_destroy  enabled  local
pool1  feature@empty_bpobj  active  local
pool1  feature@lz4_compress  active  local
pool1  feature@multi_vdev_crash_dump  enabled  local
pool1  feature@spacemap_histogram  active  local
pool1  feature@enabled_txg  active  local
pool1  feature@hole_birth  active  local
pool1  feature@extensible_dataset  enabled  local
pool1  feature@embedded_data  active  local
pool1  feature@bookmarks  enabled  local
pool1  feature@filesystem_limits  enabled  local

gkontos · Mar 31, 2015

Yuriy Tabolin said:
Under high load system works well, I tested it with bonnie, fio, iometer and there was no problem. There is one record in logs after last hang

Code:

sonewconn: pcb 0xfffff8015fd54310: Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences)

But this record appeared 30sec later after nfsd and istgt stopped answering requests. I think that is a consequence, but not a reason.

Does this look familiar --> https://lists.freebsd.org/pipermail/freebsd-fs/2015-March/021074.html ?

Yuriy Tabolin · Apr 1, 2015

gkontos said:
Does this look familiar --> https://lists.freebsd.org/pipermail/freebsd-fs/2015-March/021074.html ?

Thanks, that is really very similar to my problem.

I was able to repeat hang on another 10.1 server with zfs: create new dataset, dd of=/dev/random 1.3TB file there and then rm file. Server started to removing file, then it is hanged.

gkontos · Apr 1, 2015

Try upgrading to 10.1-STABLE if you can.

Yuriy Tabolin · Apr 10, 2015

I upgraded to 10.1-STABLE my second (non production) server, but that doesn't helped. Server still hanging when I remove large file.