ZFS Pool Scrub hangs on filename with leading space?

I had a FreeBSD server (FreeBSD 9.2 x64) that has been unable to scrub a zpool for more than a year, it would always hang at a certain point. When this happens, the system remains responsive (root is not on this zpool), but the pool and ZFS are frozen. The only recovery is a restart and a script that stops the scrub immediately after boot-up (otherwise, you get a lock-up loop). No entries in any log files can be found after such a crash.

Frankly, I assumed that this was due to aging hardware and too little RAM (4 GB RAM with an 8 TB RAID-Z2 pool). Also, the forum did not really show any comparable cases. However, when I set up a brand new server from scratch (FreeBSD 10.0 x64, 32 GB RAM, 24 TB RAID-Z2 pool), the issue started again, after the data was copied over. Long story short: after searching for days, I found that a few filenames on the zpool had a blank space (ASCII 0x20) as the first character of the filename. I am not sure if a filename starting with character 0x20 is technically illegal on FreeBSD, but renaming all files affected did solve the issue - zpool scrub now works. The bad filenames were likely introduced by Linux machines through NFS.

Now my question: has anybody experienced something similar or has some input here? I am considering to report this as a bug - IMHO, either such files should not be creatable or they should not cause a freeze of the pool.

Any feedback is appreciated!
 
Wow! I'm completely amazed that a low-level operation (such as scrub) is affected by a high-level concept (such as the file name). If you look at how most file systems are implemented (warning: I've never looked at the ZFS source code), the file name is used only at a very top level, when directories are read and the file name is returned, and when files are opened by name. All operations underneath are done on inodes and open file objects.

You suggested "such files should not be creatable". That violates one of the fundamental rules of Unix file names, enshrined in the POSIX standards: file names are arrays of 8-bit characters, with a given maximum length. The only two characters that are illegal in file names are NUL (because it ends the parsing/scanning of the file name when it is supplied by a user program, from example on the creat call), and slash (because it indicates that the file name ends here, and refers to a directory). So file names starting with character 0x20 are definitely legal. As a matter of fact, a file name consisting only of invisible control characters in the range 0x01 to 0x1F would also be perfectly legal (although a bit hard to display, but that's just a problem for the dumb humans who don't know how to interpret binary). You can put whatever crazy stuff you want in file names, as long as it doesn't contain a slash or NUL.

I don't have a throwaway system with a ZFS file system on it right now, but if I do, I''ll try creating files with space as the first character (or the only character?), and see what this does to scrub.
 
I just tested this and could not confirm the OP's allegation/claim. I created several files with a blank first character and even one file with a name consisting of one blank character (as suggested by @ralphbsz) in an existing zpool. The zpool scrub completed successfully on the zpool in the normal amount of time.

EDIT: Just "touched" files to create them. I'll retest with non-zero-byte files.
 
Last edited by a moderator:
trh411 said:
I just tested this and could not confirm the OP's allegation/claim. I created several files with a blank first character and even one file with a name consisting of one blank character (as suggested by @ralphbsz) in an existing zpool. The zpool scrub completed successfully on the zpool in the normal amount of time.
A retest confirmed what I reported in the previous post. zpool scrub was not impacted by the presence of files with a leading blank character. Files consisted of zero-byte and non-zero byte sizes.
 
Last edited by a moderator:
recluce said:
I am not sure if a filename starting with character 0x20 is technically illegal on FreeBSD [...]

I am considering to report this as a bug - IMHO, either such files should not be creatable or they should not cause a freeze of the pool.

Filenames are allowed to contain any character expect / and NUL (with . and .. being reserved). So the observed behavior is likely a ZFS bug and definitely worth reporting.
 
The observed behaviour is not proof that the issue was solely caused by the filenames with leading spaces, there could be something else at play here. What @trh411 reported above confirms that this may be something more complex than just the filenames.
 
Last edited by a moderator:
If I had no doubt, I would just have filed a bug report - so it is interesting to hear that @trh411 could not recreate the issue. What led me to the suspicion about the leading blanks is that the two servers affected here did not share any communality beyond the same data being hosted (completely different hardware, different versions of FreeBSD, different zpool versions) and the only modification done between the hanging scrubs and the working ones was eliminating the leading blanks.

I am not sure it will help, but the files hosted are mostly large files, typically between 4 and 25 gigabytes. ZFS runs without performance tweaks, compression, encryption or deduplication are not in use. Server 1 (FreeBSD 9.2) is using a pool with two RAID-Z2 vdevs of eight drives each, server 2 (FreeBSD 10.0) has one RAID-Z2 vdev with eight drives.

Any ideas what else I could test or what information to provide here?
 
Last edited by a moderator:
recluce said:
If I had no doubt, I would just have filed a bug report - so it is interesting to hear that @trh411 could not recreate the issue. What led me to the suspicion about the leading blanks is that the two servers affected here did not share any communality beyond the same data being hosted (completely different hardware, different versions of FreeBSD, different zpool versions) and the only modification done between the hanging scrubs and the working ones was eliminating the leading blanks.

I am not sure it will help, but the files hosted are mostly large files, typically between 4 and 25 gigabytes. ZFS runs without performance tweaks, compression, encryption or deduplication are not in use. Server 1 (FreeBSD 9.2) is using a pool with two RAID-Z2 vdevs of eight drives each, server 2 (FreeBSD 10.0) has one RAID-Z2 vdev with eight drives.

Any ideas what else I could test or what information to provide here?
Have you been able to recreate the problem by re-introducing a file whose name begins with a blank character?

The largest files I tested were 10-20 MB, so your scale is much larger. I can run some additional tests with more and larger files. As an FYI, I created multiple files whose name began with one blank character, two blank characters, and three blank characters.

Since my original posts I tested compressed versus uncompressed. Also, I tested single disk ZFS and ZFS mirror (2x), but do not have any RAIDZ{123} on which to test. The results were the same in all cases.
 
Last edited by a moderator:
Back
Top