Mysterious full / on db server

Hey all,

I'm stumped on an issue we had the other day. A poorly written query was run multiple times simultaneously on a dedicated MySQL database ESXi VM running 12.2-RELEASE with 16G RAM & 8G swap partition. The root fs is 12G, of which about 50% is typically available. /var/db/mysql is mounted on a separate drive.

When I was first notified of the problem, the console was reporting that it was out of swap, and I couldn't stay logged in for more than a few seconds without getting kicked out (console or ssh). The shell was closed before I could shut down mysql-server, and after multiple attempts, I finally was able to get a 'reboot' squeezed in.

After the system came online, the shell was no longer kicking me out, but a few moments later I was getting swap errors as well as disk full errors on the root fs. Shutting down mysql would solve the problem, but it would happen every time I fired the service back up.

The saturated RAM & swap make sense, given that the query was hammering the database itself. I'm assuming there's some limits on mysql that I need to put in place to ensure that it doesn't exhaust available memory, but that's not the mystery.

The mystery is why the root fs was filling up. 'df -h' would show the size of /, but 'du -s /*' wouldn't show anything getting larger. /var would grow by a few hundred megs, but not 5G. Whatever it was, it was dynamic - it would fill the filesystem one second, and the next there'd be a few hundred megs available.

I was under pressure to get services back up and running, so I may have missed something, but looking through my scrollback doesn't reveal where the disk usage was coming from. The only thing I can see that I missed was looking at /'s dotfiles with 'du', but the only file other than '.profile', '.cshrc', and '.snap' is '/.sujournal'. Could that be the culprit?

I've been working with FreeBSD since it came on floppies, and this one has me flummoxed. Anyone have a clue-stick they can beat me with?
 
When I was first notified of the problem, the console was reporting that it was out of swap, and I couldn't stay logged in for more than a few seconds without getting kicked out (console or ssh). The shell was closed before I could shut down mysql-server, and after multiple attempts, I finally was able to get a 'reboot' squeezed in.
That's usually the OOM-killer (Out Of Memory). It will kill all sorts of processes in order to free up some memory.

The saturated RAM & swap make sense, given that the query was hammering the database itself. I'm assuming there's some limits on mysql that I need to put in place to ensure that it doesn't exhaust available memory, but that's not the mystery.
Check your settings in my.cnf, it's probably configured to use much more memory than the system actually has. People often misconfigure various buffers and pools causing MySQL to use an exorbitant amount of memory. databases/mysqltuner is quite useful to verify those settings. NEVER configure MySQL to use more memory (RAM, don't include swap memory) than the system actually has.

The mystery is why the root fs was filling up. 'df -h' would show the size of /, but 'du -s /*' wouldn't show anything getting larger. /var would grow by a few hundred megs, but not 5G. Whatever it was, it was dynamic - it would fill the filesystem one second, and the next there'd be a few hundred megs available.
/tmp perhaps? Is /tmp on the root filesystem? Or is it a separate filesystem? Perhaps used tmpfs(5) for it (careful with this as it will consume memory and swap)? MySQL will create temporary tables in memory, unless they get too big, then they're created in /tmp. That could have filled it up, and be removed again when MySQL was shutdown.
 
I found FreeBSD 12 and MySQL exhausting swap space whereas previous versions (and 13.x) do not.

One option is to try tcmalloc.


I never saw the issue in normal running of MySQL - just when importing tens of millions of small rows.

Edit: doesn’t help with the disk space issue, sorry.
 
/tmp perhaps? Is /tmp on the root filesystem? Or is it a separate filesystem? Perhaps used tmpfs(5) for it (careful with this as it will consume memory and swap)? MySQL will create temporary tables in memory, unless they get too big, then they're created in /tmp. That could have filled it up, and be removed again when MySQL was shutdown.

/tmp is indeed on the root fs, but was about 50K every time I checked.

I did just notice that one of my 10 or so 'du -s /' was showing 2.4G in /home. All the others were 0B, but that's the likely location.

That begs the question, why is mysqld (or some other process) filling up /home?


Scratch that. I see now that I was looking at /home and /usr/home, the latter being 2.4G.

The mystery remains...
 
The mystery is why the root fs was filling up. 'df -h' would show the size of /, but 'du -s /*' wouldn't show anything getting larger. /var would grow by a few hundred megs, but not 5G. Whatever it was, it was dynamic - it would fill the filesystem one second, and the next there'd be a few hundred megs available.

This can be caused by temporary files that the user unlinks right after creation (they will continue to exist until the fd closes or the process dies).

Those things then take up disk space but you can't find them with `du`.

They would disappear after reboot even if not in /tmp.
 
This can be caused by temporary files that the user unlinks right after creation (they will continue to exist until the fd closes or the process dies).

Those things then take up disk space but you can't find them with `du`.

They would disappear after reboot even if not in /tmp.
Would said temp files show up in `df`? If so, why `df` and not `du`?

/tmp would be the likely location of such files, IMO, no? I'm not specifically aware of when mysqld would use /tmp, when it has its own datadir, although it's obviously not out of the question.
 
Would said temp files show up in `df`? If so, why `df` and not `du`?
When a file is removed but there's still a process with the file descriptor open on it, the file technically doesn't exist any more, thus won't show up with du(1) (it specifically looks at files). Because the file descriptor is still open the file isn't completely removed yet (the process can still read/write to it), and the file allocation tables won't have released those blocks, which is what df(1) looks at.
 
Back
Top