Solved Memory leak on FreeBSD 9.3 and FreeBSD 10?

Hey all!

Just chipping in a 'mee to' here, the sysctls I posted earlier[*] haven´t been a complete cure, but slowing the process down a bit, just for reference. Still having systems becoming unresponsive after different periods of time, depending on load, I guess.

I wonder though, if UMA isn´t used by ZFS by default:
# sysctl vfs.zfs.zio.use_uma
Code:
vfs.zfs.zio.use_uma: 0

What does this mean, in context?
# vmstat -z
Code:
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               208,      0,     104,      15,     104,   0,   0
UMA Zones:             1408,      0,     104,       0,     104,   0,   0
UMA Slabs:              568,      0,  379499,  147006,985852809,   0,   0
UMA RCntSlabs:          568,      0,   13744,       4,  142075,   0,   0
UMA Hash:               256,      0,       1,      14,       5,   0,   0
...

Because it looks to me like it´s used anyway, or am I misinterpreting the output? Doing a # sysctl -a | grep uma didn´t reveal anything else related to it, so if ZFS isn´t using UMA nothing should, right?

[*]: http://forums.freebsd.org/viewtopic.php?f=48&t=41880&start=50#p259656

/Sebulon
 
I found that I got a good chance to have a non-responsive system when I tried to build games/arx-libertatis. Maybe one of those reporting here could try it as well and check if the symptoms are close enough to the issue at hand? Maybe it would speed up debugging the problem if it would be possible to reproduce the issue in this way. If they are near enough, that is. My 2c, hope it helps :)
 
Hi all,

similar problem here with latest 10.1p5, binary kernel, binary packages:

  • 16 GB RAM total
  • 3 TB ZFS, exported via NFS and smbd 4.1.16
  • Tuning:
    • vm.max_wired=3145728 [12 GB]
    • vfs.zfs.txg.timeout="5"
    • vfs.zfs.arc_max="4G"
    • vfs.zfs.zio.use_uma="0"
  • How to reproduce:
    • Reboot: ARC is empty, wired pages at around 500 MB
    • Start a full text search (locally, no NFS nor SMB) in one of the ZFS datasets
    • ARC fills with approx. read speed (250 MB/sec) up to 4GB
    • wired memory fills too, but slightly faster
    • ARC reaches 4GB - but wired memory keeps increasing slowly at ~ 1 MB/sek
  • Within a few days wired memory is exhausted an I have to reboot

ZFS is v5000 without L2ARC, with lz4 compression on some dataset, but not the tested one

Is there any solution or workaround available?

cu
Michael
 
In my case the problem is gone since I use FreeBSD 10.1 .
Swap is used between 25 and 450 MB (atm. around 300 MB)
But as long as the server is not continuously paging in and out, I see no problem.
Uptime now 70+23:02:39
Before this server has stopped working around every 14 days.

16G RAM
2x 6 raidz1 10.5T alloc 5.84T free
2x l2arc 80GB SSD
primary und secondary cache metadata only
vfs.zfs.arc_meta_limit="8G"
 
Hm, strange.

Swap is never used, the first thing that fails is winbindd(8) which can't mlock() memory: Domain user can't log in any more.

Do you have any limits (login.conf)? But I don't know how this could influence ZFS and its memory behavior...?

Stop, idea:
I just tested a full text search on the root file system (XFS): Same problem, slowly increasing wired memory usage!
 
Never ever I don't know, but in case of empty wired memory it must not be used - as I understood the wired pages...

Currently (~3h after reboot):

Code:
Device  1K-blocks  Used  Avail Capacity
/dev/ada1p3  5242880  0  5242880  0%

Maybe it's not a problem with ZFS but with the disk subsystem.
 
Hi all,
I think it is not only a ZFS problem.

When I start a file search like

grep -r "test123" ./

on a ZFS disk or a XFS disk wired memory usage increase very fast.

It's not a measure of search amount of data but it depends on the number of opened files. When grepping in a lot of small files, wired memory usage increases significantly faster than when grepping in a few very large files.
 
Hey all!

So I tried upgrading our test server, a virtual machine set up exactly like our physical storage servers to 10.1-RELEASE, patched according to the instructions from:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

But that didn´t help, I wrote a script that copied files to/from the VM and after a while it stopped responding. At that point I filed my own bug report:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164

In which I was instructed to go up to 10.1-STABLE. Had to first rewind back to 9.3-RELEASE from snapshot since it kept halting when rebuilding world but after getting it up to STABLE the problems are gone. I was able to run my file-copying script over night and it was still doing it´s thing when I came to work this morning. Going to test some more but this seems very promising for 10.2-RELEASE:)

/Sebulon
 
Hey all!

So I tried upgrading our test server, a virtual machine set up exactly like our physical storage servers to 10.1-RELEASE, patched according to the instructions from:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

But that didn´t help, I wrote a script that copied files to/from the VM and after a while it stopped responding. At that point I filed my own bug report:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164

In which I was instructed to go up to 10.1-STABLE. Had to first rewind back to 9.3-RELEASE from snapshot since it kept halting when rebuilding world but after getting it up to STABLE the problems are gone. I was able to run my file-copying script over night and it was still doing it´s thing when I came to work this morning. Going to test some more but this seems very promising for 10.2-RELEASE:)

/Sebulon
Hi
Have your problem completely gone after switching to 10.1-STABLE?

By the way, is there anyway for us to upgrade from 10.1-RELEASE to 10.1-STABLE, it seems we are unable to upgrade directly by using freebsd-update
 
Hi
Have your problem completely gone after switching to 10.1-STABLE?

By the way, is there anyway for us to upgrade from 10.1-RELEASE to 10.1-STABLE, it seems we are unable to upgrade directly by using freebsd-update

Jupp, all gone, so far :)

No there is no way you can use freebsd-update to go to STABLE, you´ll need to compile it yourself:

# cd /usr/src
# svn co svn://svn.freebsd.org/base/stable/10 /usr/src

Then just follow this:
https://www.freebsd.org/doc/handbook/makeworld.html

/Sebulon
 
Jupp, all gone, so far :)

No there is no way you can use freebsd-update to go to STABLE, you´ll need to compile it yourself:

# cd /usr/src
# svn co svn://svn.freebsd.org/base/stable/10 /usr/src

Then just follow this:
https://www.freebsd.org/doc/handbook/makeworld.html

/Sebulon
My FreeBSD 10.1-STABLE server has been running for more than 50 days, seems the memory leak issue has completely gone. :) As long as I have never seen the symptom as previous.

Will keep it running and see how long can it serve.
 
10.1 RELEASE up for 60+ days, no problems... your argument could be invalid ;)
My FreeBSD 9.3 server has been running for more than 60 days, however the ARC utilization keeps dropping from 100% to 60% on IO intensive server. The newly installed FreeBSD 10.1-STABLE with higher workload never experience such issue, after 50 days the ARC utilization still remains 100%.
 
10.1 RELEASE up for 60+ days, no problems... your argument could be invalid ;)

At this point the L2ARC SSDs wasn't filled with more than 65-67 GB each (80GB)
L2ARC was healthy.

---
3 days later
up 63 days

After L2ARC (2x80GB) was filled up (primary & secondarycache = metadata)

Code:
cache           -      -      -      -      -      -
  ada3      74.7G  16.0E     64      0   275K  23.1K
  ada2      74.5G  16.0E     63      0   274K  23.1K

Code:
L2 ARC Summary: (DEGRADED)
    Passed Headroom:            90.44m
    Tried Lock Failures:            359.75m
    IO In Progress:                4.31m
    Low Memory Aborts:            1.77k
    Free on Write:                171.76k
    Writes While Full:            12.18k
    R/W Clashes:                45.10k
    Bad Checksums:                7.72k
    IO Errors:                4.57k
    SPA Mismatch:                23.75k

---

now
10.1-STABLE #1
up 32 mins
lets see ...
 
Time to mark this post to resolved , none of my FreeBSD 10.1-STABLE servers are having memory issue so far and most of these servers are around 120-160 days uptime.

L2ARC enabled and the ARC utilization remains 100% on the server with 161 days uptime. ;)

So I will assume that this issue will be completely gone in the coming release FreeBSD 10.2.
 
Time to mark this post to resolved , none of my FreeBSD 10.1-STABLE servers are having memory issue so far and most of these servers are around 120-160 days uptime.

L2ARC enabled and the ARC utilization remains 100% on the server with 161 days uptime. ;)

So I will assume that this issue will be completely gone in the coming release FreeBSD 10.2.

Same error again:

Code:
FreeBSD 10.2-RELEASE-p5 #0 r289696: Wed Oct 21 15:54:53 CEST 2015
# uptime
 2:48PM  up 8 days, 23:26, 2 users, load averages: 2.06, 1.19, 0.92

  NAME        STATE     READ WRITE CKSUM
  mypool      ONLINE       0     0     0
    raidz2-0  ONLINE       0     0     0
        da0     ONLINE       0     0     0
        da1     ONLINE       0     0     0
        da2     ONLINE       0     0     0
        da3     ONLINE       0     0     0
        da4     ONLINE       0     0     0
        da5     ONLINE       0     0     0

    cache
      da6       ONLINE       0     0     0
      da7       ONLINE       0     0     0


L2 ARC Summary: (DEGRADED)
    Passed Headroom:            9.87m
    Tried Lock Failures:            581.63k
    IO In Progress:                70.80k
    Low Memory Aborts:            669
    Free on Write:                1.60m
    Writes While Full:            291.07k
    R/W Clashes:                92.84k
    Bad Checksums:                8.31k
    IO Errors:                3.59k
    SPA Mismatch:                1

L2 ARC Size: (Adaptive)                2.31    TiB
    Header Size:            0.52%    12.34    GiB

L2 ARC Evicts:
    Lock Retries:                57
    Upon Reading:                0

L2 ARC Breakdown:                161.21m
    Hit Ratio:            18.22%    29.38m
    Miss Ratio:            81.78%    131.83m
    Feeds:                    1.01m

L2 ARC Buffer:
    Bytes Scanned:                17.17    TiB
    Buffer Iterations:            1.01m
    List Iterations:            62.68m
    NULL List Iterations:            40.26m

L2 ARC Writes:
    Writes Sent:            100.00%    448.06k

Another Server with different hardware

Code:
FreeBSD 10.2-STABLE #3 r287155: Wed Aug 26 13:30:07 CEST 2015

# uptime
 2:55PM  up 65 days,  1:59, 1 user, load averages: 0.46, 0.35, 0.33

L2 ARC Summary: (DEGRADED)
    Passed Headroom:            141.90m
    Tried Lock Failures:            7.16m
    IO In Progress:                3.64m
    Low Memory Aborts:            1.15k
    Free on Write:                144.31k
    Writes While Full:            10.06k
    R/W Clashes:                30.22k
    Bad Checksums:                90.04m
    IO Errors:                26.97m
    SPA Mismatch:                149

L2 ARC Size: (Adaptive)                277.88    GiB
    Header Size:            1.50%    4.17    GiB

L2 ARC Evicts:
    Lock Retries:                1.12k
    Upon Reading:                9

L2 ARC Breakdown:                4.46b
    Hit Ratio:            17.29%    771.86m
    Miss Ratio:            82.71%    3.69b
    Feeds:                    5.54m

L2 ARC Buffer:
    Bytes Scanned:                204.87    TiB
    Buffer Iterations:            5.54m
    List Iterations:            354.25m
    NULL List Iterations:            159.61m

L2 ARC Writes:
    Writes Sent:            100.00%    2.34m
 
Same error again:

Code:
FreeBSD 10.2-RELEASE-p5 #0 r289696: Wed Oct 21 15:54:53 CEST 2015
# uptime
2:48PM  up 8 days, 23:26, 2 users, load averages: 2.06, 1.19, 0.92

  NAME        STATE     READ WRITE CKSUM
  mypool      ONLINE       0     0     0
    raidz2-0  ONLINE       0     0     0
        da0     ONLINE       0     0     0
        da1     ONLINE       0     0     0
        da2     ONLINE       0     0     0
        da3     ONLINE       0     0     0
        da4     ONLINE       0     0     0
        da5     ONLINE       0     0     0

    cache
      da6       ONLINE       0     0     0
      da7       ONLINE       0     0     0


L2 ARC Summary: (DEGRADED)
    Passed Headroom:            9.87m
    Tried Lock Failures:            581.63k
    IO In Progress:                70.80k
    Low Memory Aborts:            669
    Free on Write:                1.60m
    Writes While Full:            291.07k
    R/W Clashes:                92.84k
    Bad Checksums:                8.31k
    IO Errors:                3.59k
    SPA Mismatch:                1

L2 ARC Size: (Adaptive)                2.31    TiB
    Header Size:            0.52%    12.34    GiB

L2 ARC Evicts:
    Lock Retries:                57
    Upon Reading:                0

L2 ARC Breakdown:                161.21m
    Hit Ratio:            18.22%    29.38m
    Miss Ratio:            81.78%    131.83m
    Feeds:                    1.01m

L2 ARC Buffer:
    Bytes Scanned:                17.17    TiB
    Buffer Iterations:            1.01m
    List Iterations:            62.68m
    NULL List Iterations:            40.26m

L2 ARC Writes:
    Writes Sent:            100.00%    448.06k

Another Server with different hardware

Code:
FreeBSD 10.2-STABLE #3 r287155: Wed Aug 26 13:30:07 CEST 2015

# uptime
2:55PM  up 65 days,  1:59, 1 user, load averages: 0.46, 0.35, 0.33

L2 ARC Summary: (DEGRADED)
    Passed Headroom:            141.90m
    Tried Lock Failures:            7.16m
    IO In Progress:                3.64m
    Low Memory Aborts:            1.15k
    Free on Write:                144.31k
    Writes While Full:            10.06k
    R/W Clashes:                30.22k
    Bad Checksums:                90.04m
    IO Errors:                26.97m
    SPA Mismatch:                149

L2 ARC Size: (Adaptive)                277.88    GiB
    Header Size:            1.50%    4.17    GiB

L2 ARC Evicts:
    Lock Retries:                1.12k
    Upon Reading:                9

L2 ARC Breakdown:                4.46b
    Hit Ratio:            17.29%    771.86m
    Miss Ratio:            82.71%    3.69b
    Feeds:                    5.54m

L2 ARC Buffer:
    Bytes Scanned:                204.87    TiB
    Buffer Iterations:            5.54m
    List Iterations:            354.25m
    NULL List Iterations:            159.61m

L2 ARC Writes:
    Writes Sent:            100.00%    2.34m

I had this error initial on using FreeBSD 10.1-STABLE release when using gnop to force the l2arc SSD as 4k optimized drive.
 
Back
Top