Rules for naming a file

TomHsiung · Jan 25, 2018

Hi, guys

I am wondering, is there any standards of rules for naming the filename or folder-name of Unix. I have macOS, Windows, and FreeBSD now, which makes it urgent to make a rule for naming the files and folders in order to share common data more easily.

Tom

SirDice · Jan 25, 2018

I would suggest sticking to lowercase letters and don't put spaces in the name. The reason for the lowercase is that Windows and MacOS are case-insensitive. If you use ZFS it's also a good idea to set the casesensitivity property of the filesystem to insensitive. For UNIX Test, test and TeSt are different files, for Windows they all refer to the same file. Which can cause problems if you actually have two files, Test and test for example.

Nicola Mingotti · Jan 25, 2018

If you work with rencet consumer OS(s) like OSX and new Windows it guess they are all case sensitive so you can use CamelCase if you like it. [please check, I don't have Windows at hand]

If you want to work from the shell in FreeBSD, NEVER put spaces into filenames. This will create a lot silly problems. If you use file managers all the time than no problem, use spaces in filenames.

Unix do not need file extensions (.txt, .doc, .xls etc.) but Windows does, so if you have a disk in share with Windows always put the correct extension.

If you share stuff with OLD OLD operating systems (e.g. DOS), then you must adhere to more restrictive rules. AFAIR, 8 char long and all in uppercase + extension for DOS.

... at the end, you only know your operating systems, Unix is exremely liberal about naming files, choose the convention you want, do it watching your existing file names.

Hope it helps
Bye

SirDice · Jan 25, 2018

Nicola Mingotti said:
If you work with rencet consumer OS(s) like OSX and new Windows it guess they are all case sensitive so you can use CamelCase if you like it. [please check, I don't have Windows at hand]

MacOS and Windows are case-preserving and case-insensitive. Which means you can create a file called TesT and the capitalization is preserved but test will refer to the same file.

https://en.wikipedia.org/wiki/Case_preservation

aragats · Jan 25, 2018

SirDice said:
MacOS and Windows are case-preserving and case-insensitive.

I always enable case sensitivity in macOS. IMO case sensitivity makes much more sense like it does in most programming languages.

ralphbsz · Jan 25, 2018

In theory, the only rules that the operating system enforces are: the nul characters (the character whose ASCII code is zero) may not show up in a file name, the "/" is a directory separator and therefore can't be in a file name; all other 254 character codes may be used. And there is a maximum length, which is so long in today's systems that it no longer matters. In practice, one needs sensible rules to prevent going insane.

Here is a set of rules I consider reasonable: No spaces in file names, as already said; they make using the command line harder, and wreak havoc on badly written scripts. Well written scripts can handle anything, but writing scripts correctly is surprisingly hard.

Much more important: Never special non-printing characters (like newline), anything with integer codes less than 32 and 127, below space and above twiddle; we'll get to 128 and above later. And be super careful with special characters, like "-*&%?#!<>". To begin with, never begin a file name with "-", since it will get confused for an option. Famous example: Have two files named "-rf" and "*" in your directory, and most rm command will delete everything, unless you are super careful. Also avoid file names that look like windows devices (don't call a file "prt:"). If you want to have fun, create a file whose name is "~user" for a valid user name, and see what shell autocompletion does. (More fun in a footnote below.)

Personally, I like using extensions that clearly indicate a file name: Straight text files should be called ".txt", and PDF documents ".pdf", and source code ".C". While that is not necessary, it makes life easier, since one knows right away how to use a particular file. And one doesn't have to be restricted to a few extensions, one can also make up new ones. For example, I have several files called ".todo", which are my to-do lists, even though that are actually straight text files.

And now a painful topic: Which character set to use. If you stick to 7-bit ASCII, life will be easy. Anything else is problematic and will cause trouble unless you follow strict rules. The problem is: The file system (in the kernel) doesn't store strings with a known encoding (character set and locale), it stores an array of characters. The underlying problem is that the kernel does not know what locale and encoding a user process is using, and therefore can not convert the strings to the correct encoding when returning them. If one user creates a file (puts the file name as a string into the kernel) in utf-8 encoding, and another user looks at the directory content (gets a file name as a string from the kernel), but is running is iso8859-1, then the second user will display nonsense. So here would be my recommended rules: Either disallow any file names that contain non-ASCII characters (no european or CJKV = asian characters). Or make sure absolutely everyone who uses that file system (including people who use it via NFS and CIFS=Samba) use *only* utf-8 encoding. I understand that this rule is not friendly for people outside english-speaking countries, but it does really prevent chaos and confusion. A bad alternative is: make sure all processes use the same locale rendering (for example iso8859-1 in western europe), but that alternative doesn't work well when running processes that happen to be set to utf-8. For example logging in from a terminal emulator that's on a Windows or Mac machine and uses the local rendering. Examples of chaos include: One process creates a file (for example named "a with forward accent"), and another user sees gibberish, perhaps multiple characters, perhaps something undisplayble. Even better: A process creates two files that to him are distinguishable (for example two "a" with different accents). Another process later sees two files that in his locale's rendering seem to have exactly the same name, and he has no idea how to separate them. He may not even be able to enter their names from the keyboard, so he ends up with a file he can't even delete without doing the dangerous "rm *".

Footnote about really brutal fun: If you are a file system implementor, you can actually allow file names and directory names that contain the "/" character in the kernel. I did that once by mistake (when implementing the windows to Unix character set conversion, we by mistake created "/" in file names). It's surprising how much stuff actually works correctly: In the output of ls, you see a single entry whose name is "a/b". If that entry is a directory, you can create "a/b/c", which is file "c" in directory "a/b". It is also surprising how spectacularly things break. Obviously, the shells are toast when it comes to globbing and autocompletion. What surprised me is how badly they blow up; core dumps from the shell mean that some programmer was sloppy. What is less obvious: The standard C library also blows up; it turns out functions like "open" like to parse file names, and the library is also written very sloppy.

Sensucht94 · Jan 25, 2018

Nicola Mingotti said:
If you share stuff with OLD OLD operating systems (e.g. DOS), then you must adhere to more restrictive rules. AFAIR, 8 char long and all in uppercase + extension for DOS.

Actually, like Windows, and like SirDice explained, most DOS were case-insensitive and case-preserving, though in the DOS era it was a common habit to write commands and files names in capital letters, while options/switches and file extension in small letters.
Regarding the 8 char length limit, that's true indeed, as enstablished by the 8.3 filename convention, though with VFAT (later backported to MS-DOS 6.22), WindowsNT 3.5 added support for long file names, and so did Windows95 with FAT32. Both file systems have the feature of generating a 8.3 file nime for every LFN is found, so that it can be read in DOS; for instance KeyboardCfg.txt becomes KEYBOARD~1.TXT if read from a 16-bit OS running on a VFAT partition (any DOS) or a FAT32(MS-DOS 7.1/8.0 and FreeDOS only). FreeDOS added DOSLFN as a workaround for truncated names: a TSR to display long names in pure DOS, though personally I never use it, it as it's quite avid in RAM

ralphbsz · Jan 25, 2018

The funniest one was cp/m: All files used from the command line had to be all uppercase, since the "shell" automatically uppercased its input. But using programs, you could create files with lowercase characters in the name, which then could not be accessed (not even deleted) from the command line.

At least under more modern versions of cp/m, you could not create file names that contained 8-bit characters (like accented characters), since the file's time stamp was hidden in the uppermost bits of the 8+3 file name.

Nicola Mingotti · Jan 26, 2018

Trivial example how how you can get little troubles putting unusual filenames in Unix.

Make a dumb file and call it "-test"

Code:

$> echo hello > -test

Now try to remove it in the usual way ...

Code:

$> rm -test
rm: illegal option -- t
usage: rm [-f | -i] [-dIPRrvWx] file ...
       unlink file

bye
Nicola

swegen · Jan 26, 2018

Some trickery is required to access a file starting with a dash:
rm ./-test or rm -- -test

Otherwise the utility parses the dash as an option t:
rm -t est

TomHsiung · Jan 26, 2018

So Case-sensitive first, that it should always to avoid upper case letters.

So no space second, that it should always to avoid space. Instead use other letter to replace space, like _

Nicola Mingotti · Jan 26, 2018

swegen said:
Some trickery is required to access a file starting with a dash:
rm ./-test or rm -- -test

Otherwise the utility parses the dash as an option t:
rm -t est

This is a nice ad hoc solution I did not know.

My universal solution to deal with ugly filenames of all kinds
is to avoid shell parsing at all. Like this, e.g. with Ruby

Code:

$> pry
pry> File.delete("-test")

ralphbsz · Jan 26, 2018

The no-upper-case letter rule is sensible *if you are sharing the same file system with operating systems that are case-blind*. For example, if you are running a Samba server on your home directory. If you are not, and you don't copy files to/from DOS/Windows file systems, then I personally find mixed case OK. You can use that to create conventions of your own. For example, I like to use capital letters for "important" directories: in my scanned documents I have directories such as "Tax", "Health" and "Job"; that makes them be shown at the beginning of a directory listing, where they belong (because they are usually the thing that you are looking for). On the other hand, lower-case directories are small and unimportant. And files and directories that are in all uppercase are always temporary, and usually have names like "TEMP" or "FOO.txt". This is just my personal convention, but as you see using upper and lower case can help conveying information.

Even if you are running with a case-sensitive file system, don't create two entries in the same directory whose only difference is the case: Having one file called "Test" and a second one called "TeSt" is obviously dangerous. Similarly, be careful to not create confusion between lowercase "l", "1" and uppercase "I", which in some fonts look very similar. And if you use "-" in file names, beware that it can be confused for "_". This are just common sense, not an absolutely necessary rules.

Rules for naming a file

TomHsiung

SirDice

Administrator

Nicola Mingotti

SirDice

Administrator

aragats

ralphbsz

Sensucht94

Guest

ralphbsz

Nicola Mingotti

swegen

TomHsiung

Nicola Mingotti

ralphbsz