• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Rsync filename encoding issue

charlesg

New Member


Messages: 3

#1
Hi guys,

I'm having some problems in seting up a basic backup over ssh with rsync. It looks like a filename encoding issue. My setup is as follow:

rsync version 3.1.2 protocol version 31

1 FreeBSD 10.3-RELEASE-p24 server which mounts a windows server 2012 share like so:

//user@Server/Folder /mnt/folder smbfs rw,-N,-I[serverIp],-Wdomain,-Eutf8:cp1252 0 0

1 FreeBSD 11.1-RELEASE-p4 server with ZFS all around.


If I run rsync like so:

rsync -rtdvz --delete --rsh='ssh -p1234' ssh-ip:/zfs/src/ /windowsShare/dest/

I end up with a stuck rsync process which hits 100% cpu. If I do a truss on this process it looks like it is on an infinite lstat loop on a file with accents:

lstat("file-with-accents.pdf",{ mode=-rwxrwx--- ,inode=1994285598,size=2904786,blksize=16644 }) = 190 (0xbe)

If I add the "--iconv=." option to the rsync command, the process finishes but it skips the accented files with the error "cannot convert filename".

I tried many encodings with the --iconv option but I can't get it to work.

Any help please?
 

ralphbsz

Daemon

Thanks: 657
Messages: 1,123

#2
I'm not going to be helpful, but obnoxious:

If you have observed rsync in an infinite loop, that's a bug in rsync. You need to contact the author or maintainer. I happen to know the original author, and to my knowledge he no longer works on stuff like this, so go find the person(s) who maintains it now, and file a bug report. This is most likely not a FreeBSD specific problem, but a generic rsync bug.

And now the seriously obnoxious part: Anyone who uses non-7bit-ASCII characters in file names takes their life into their own hands. The best solution to this problem is to get rid of accented characters in file names The reason for that is a fundamental deficiency of the Unix design: File names are passed between userspace and kernel as "strings", meaning nul-terminated arrays of bytes. The only restriction Unix imposes on the characters in the file name is that none of them can be nul; all other characters are permissible (yes, it is legal to create files whose names are ">" or "-rf", but those are only useful to annoy people). The problem is that the kernel (which has to execute operations such as create file, find file, iterate over directories) doesn't know what the locale or encoding of the user process it: It gets passed an array of binary gibberish, and has no idea whether it is in utf8, iso8859, or anything else. And this is the source of all these problems with special characters in file names. Having to do i18n conversions on file names in user space (or worse in kernel space, when mounting windows file systems) is just full of problems.

Yes, this problem can be solved, by serious discipline, and configuring all your systems and user processes correctly. My favorite solution: Configure everything to only use utf8, configure all your file systems to not be case-blind, and do file sharing (or cluster file systems) only within Unix machines, without Windows. But the moment one user breaks this discipline, things will get out of hand. I understand that this is not user-friendly, and particularly difficult for CJKV and European areas, but it is the easiest solution from a computer point of view.
 

charlesg

New Member


Messages: 3

#3
I'm not going to be helpful, but obnoxious:

If you have observed rsync in an infinite loop, that's a bug in rsync. You need to contact the author or maintainer. I happen to know the original author, and to my knowledge he no longer works on stuff like this, so go find the person(s) who maintains it now, and file a bug report. This is most likely not a FreeBSD specific problem, but a generic rsync bug.

And now the seriously obnoxious part: Anyone who uses non-7bit-ASCII characters in file names takes their life into their own hands. The best solution to this problem is to get rid of accented characters in file names The reason for that is a fundamental deficiency of the Unix design: File names are passed between userspace and kernel as "strings", meaning nul-terminated arrays of bytes. The only restriction Unix imposes on the characters in the file name is that none of them can be nul; all other characters are permissible (yes, it is legal to create files whose names are ">" or "-rf", but those are only useful to annoy people). The problem is that the kernel (which has to execute operations such as create file, find file, iterate over directories) doesn't know what the locale or encoding of the user process it: It gets passed an array of binary gibberish, and has no idea whether it is in utf8, iso8859, or anything else. And this is the source of all these problems with special characters in file names. Having to do i18n conversions on file names in user space (or worse in kernel space, when mounting windows file systems) is just full of problems.

Yes, this problem can be solved, by serious discipline, and configuring all your systems and user processes correctly. My favorite solution: Configure everything to only use utf8, configure all your file systems to not be case-blind, and do file sharing (or cluster file systems) only within Unix machines, without Windows. But the moment one user breaks this discipline, things will get out of hand. I understand that this is not user-friendly, and particularly difficult for CJKV and European areas, but it is the easiest solution from a computer point of view.
I know that you are right and that I should report this issue, but it feels like it will be easier to just find a workaround. Also, I do have a rsync backup going in the reverse direction between the same 2 servers (windows share to zfs over ssh), with the same kind of accented files and it works just fine.

To make it seem more like a FreeBSD issue, as a temporary solution I do run the same problematic backup from a linux system, mounting the same windows share, and it works just fine... no "serious discipline" involved :)

I understand what you're saying but it seems like I'm only missing the right rsync conversion option, or that my encoding is set improperly. Can you please tell me how I can verify the expected filename encoding on both systems?
 

ralphbsz

Daemon

Thanks: 657
Messages: 1,123

#4
Honestly, I don't know. I haven't set encodings on mounts in ages, in particular not on FreeBSD (I have only done it on Linux and commercial Unixes, as part of file system development work, about 15 years ago). Actually, it's strange that your problem goes away on Linux. That points to the root cause to be some interaction between rsync code (which must be faulty, since an infinite loop is always faulty) being triggered by some FreeBSD behavior. I fear your best answer will be trial and error, until the problem magically goes away. This may be frustrating, but that's life.
 
Top