ZFS: nfs mountd 100% CPU for very long time

Code:
Version                 : 8.2-RELEASE
Problem short           : Denial of NFS service while mountd is reading /etc/exports?

Problem intro/background:
  • We are using ZFS for serving home directories [NFS and Samba].
  • All users have a dedicated ZFS filesystem with quota, e.g. /tank/homes/johndoe
  • The number of users >750
  • Example export entries for each user:
    Code:
    /tank/homes/johndoe -maproot=root -alldirs -network 1.2.3.0 -mask 255.255.255.0
    /tank/homes/johndoe -maproot=root -alldirs -network 1.2.4.0 -mask 255.255.255.0
Starting or reloading (pkill -HUP mountd) results in massive CPU use and takes no less then 11 minutes :\

Duration example for 757 export entries:
Code:
# date ; /usr/bin/time /etc/rc.d/mountd start ; date
Thu Feb 16 19:21:12 CET 2012
Starting mountd.
      683.69 real       125.03 user        41.82 sys
Thu Feb 16 19:32:35 CET 2012

Duration example for only 12 export entries (fifteen seconds)
Code:
# date ; /usr/bin/time /etc/rc.d/mountd start ; date
Thu Feb 16 19:36:15 CET 2012
Starting mountd.
       15.86 real         2.41 user         3.33 sys
Thu Feb 16 19:36:30 CET 2012

We can see the following with truss:
Code:
 9015: nmount(0x800a140c0,0xc,0x10001010,0x800c33ee8,0xfefefefefefefeff,0x1e9008) ERR#2 'No such file or directory'
 9015: __sysctl(0x7fffffffe430,0x2,0x7fffffffe3c0,0x7fffffffe428,0x800855dc5,0xc) = 0 (0x0)
 9015: __sysctl(0x7fffffffe3c0,0x2,0x0,0x7fffffffe490,0x0,0x0) = 0 (0x0)
 9015: __sysctl(0x7fffffffe430,0x2,0x7fffffffe3c0,0x7fffffffe428,0x800855dc5,0xc) = 0 (0x0)
 9015: __sysctl(0x7fffffffe3c0,0x2,0x800a17180,0x7fffffffe490,0x0,0x0) = 0 (0x0)
These are repeated >150.000 times even with only 12 exports.

We see the following call and timings with truss - again only 12 export entries:

Code:
# truss -f -c /usr/sbin/mountd
14330: SIGNAL 15 (SIGTERM)
syscall                     seconds   calls  errors
fcntl                   0.000278252      17       0
fork                    0.001412772       1       0
geteuid                 0.089589641    6938       0
getpid                  0.521908867   42855       0
getuid                  0.000013409       1       0
readlink                0.000029334       1       1
lseek                   0.000274343      18       1
mmap                    0.001332591      66       0
mprotect                0.000095823       5       0
open                    0.001434560      54       4
close                   0.001251852      73       0
unlink                  0.000107278       1       0
chdir                   0.000021511       1       0
fstat                   0.001180616      52       0
stat                    3.522373703  165623  143696
lstat                   0.000724128      24       0
write                   0.181991931    6919       0
ioctl                   0.000043302       2       2
break                   0.000017041       1       0
access                  0.001628995      64      48
sigaction               0.545044381   29478       0
accept                  0.000047214       1       0
bind                    0.000050288       4       0
connect                 0.000464311      15       1
getpeername             0.130160253    6953      12
getsockname             0.138684142    6943       0
recvfrom                0.000040788       1       0
sendto                  0.569660933   35925       0
select                 97.728708332       4       1
poll                   17.078616281   23544       0
gettimeofday            0.706043647   38205       0
clock_gettime           0.940735278   59466       0
kevent                  0.000398381       1       0
sigprocmask             0.001400199      82       0
socket                  0.000394192      27       0
getrlimit               0.000021790       1       0
ftruncate               0.000078782       2       0
munmap                  0.001332594      31       0
read                    0.939261964   47114       0
                      ------------- ------- -------
                      123.106853699  470513  143766
syscall                     seconds   calls  errors
fcntl                   0.000254227      15       0
fork                    0.001412772       1       0
geteuid                 0.089467279    6926       0
getpid                  0.521776446   42842       0
getuid                  0.000013409       1       0
readlink                0.000029334       1       1
lseek                   0.000263727      17       0
mmap                    0.001332591      66       0
mprotect                0.000095823       5       0
open                    0.001276717      46       4
close                   0.000822186      44       0
fstat                   0.001002380      43       0
stat                    3.522351633  165622  143696
lstat                   0.000724128      24       0
write                   0.181967067    6918       0
ioctl                   0.000043302       2       2
break                   0.000017041       1       0
access                  0.001628995      64      48
sigaction               0.544944087   29473       0
connect                 0.000122085       3       1
getpeername             0.129590062    6917       0
getsockname             0.138234076    6916       0
recvfrom                0.000040788       1       0
sendto                  0.569660933   35925       0
poll                   17.078432735   23532       0
gettimeofday            0.705748074   38189       0
clock_gettime           0.940735278   59466       0
kevent                  0.000398381       1       0
sigprocmask             0.000909067      46       0
socket                  0.000091074       4       0
ftruncate               0.000032128       1       0
munmap                  0.000354522      18       0
read                    0.939028411   47103       0
                      ------------- ------- -------
                       25.372800758  470233  143752
Any help is greatly appreciated.
 
When mountd starts it clears NFS export settings for all local file systems, so does not matter how many file systems are exported, it will call nmount(2) for each local file system.

When mountd gives export information to NFS server it calls nmount(2) for each export specification.

When there is user name, group name, UID or GID in exports(5) file functions like getpwnam(3) are called, each such function call require at least one stat(2) system call to verify whether the /etc/nsswitch.conf file was changed (according to the lib/libc/net/nsdispatch.c source code).

I created 762 file systems on ZFS file system, backed by vnode md(4) device (without quotas) and put all these file systems to exports(5) file, each file system is exported to own network address, all file systems have "-maproot root". It takes ~1.5 minute to load this configuration by mountd (of course this value depends on the current VFS cache content). showmount -e localhost shows that all file systems were exported correctly. Number of nmount(2) system calls corresponds to correct value (see my first sentence). If exports(5) has only entry, then it takes ~15 seconds.

How many local file systems do you have (mount | wc -l)?
What is the content of /etc/nsswitch.conf?
Can you show how do you use quotas (commands)?
 
Back
Top