Other 'Illegal byte sequence' when copying files

I recently found some old CD's which had copies of MP3s on them which I wanted to keep on disk. The initial copy to an internal UFS partition went OK even though some of the titles included Polish characters which don't display properly (since I haven't worked out which locale I need to use), but when I wanted to copy the files to an external NTFS disk, I got complaints about 'Illegal byte sequence'. I do not get this error when copying to another UFS location, so this seems to be a problem related to NTFS...

Anyone come across anything like this or know how to get round it?
 
"Illegal byte sequence" usually means that a sequence of bytes is trying to be interpreted as a Unicode string, but is not in a valid unicode encoding. The cause is probably a file name. I would start by turning off unicode as the encoding for the process that is doing the copy, and changing to a simpler encoding. You'll probably have to fix the file names later.
 
"Illegal byte sequence" usually means that a sequence of bytes is trying to be interpreted as a Unicode string, but is not in a valid unicode encoding. The cause is probably a file name. I would start by turning off unicode as the encoding for the process that is doing the copy, and changing to a simpler encoding. You'll probably have to fix the file names later.

Not really sure what you are suggesting.... how do I turn off unicode? The problem does arise when copying files which include Polish characters, but only when I copy these files from UFS to NTFS, there is no error when copying from UFS to UFS.
 
All Windows-based file sytems on Unix have a way to perform translation of file names in/out of the Windows encoding mechanism. Typically that is done with mount options. I happen to remember the one for mount-msdosfs, which is the -L (capital ell) option. You need to match the locale specified with -L to the one that was (will be) used on the Windows machine when creating the data (or reading the data). I'm sure the NTFS mount command has a similar option, read the man page.

And in doing that, you need to be aware that not all possible file names can be written to Windows file systems. For example, file names "LPT1" and "A>B" are illegal on Windows. Furthermore, there may be characters in file names that can not be encoded in the locale you have selected, in which case you need to explicitly rename the files.

In general, having file names with anything other than 7-bit ASCII file names is asking for trouble, and should be avoided if you are interested in interoperability, in particular having media shared by more than one entity. If you really want to use them, you have to be super careful about configuring everything correctly. The errors which can occur are very amusing. My favorite examples include directories that contain two files that "have the same name", and files that exist and you can display their name, but you can not perform any operations on them because their names can only be displayed, not selected programatically.
 
Back
Top