Solved UTF-8 filename troubles - I simply don't get it

  • Thread starter Thread starter Deleted member 43773
  • Start date Start date
D

Deleted member 43773

Guest
Hi, it's me again,

I found some likewised threads (what gave me the idea about convmv and several other things, like look a file's names by cat or hexdump to see the truth), but I didn't found an answer that helped me to understand my actual problem.

I am daily taring my /home/...
This dir contains 200 files originally came from a Win7 system (NTFS, filenames in UTF-16).
So I wrote a tiny trivia script to do it automatically (not be worth to be printed here.) The core line is just the tar command, nothing special:
tar -czf /destination/tarfile.tar.gz ~

If cron executes it, everytime I reveice a long mail with 200 entries about those files:
: Can't translate pathname 'home/....äöü (1) ß 101_p9764...' to UTF-8
(mail's error messages of course respect file- and directory names - what formerly came from Windows.)

But if I do the same tar command by hand, except
tar: Removing leading '/' from member names
no message occur, even if I add -V (verbose) to the tar command.
Also if I run the same script by myself, no message occur.
It doens't matter if I do the tar on a related file directly (full path) or its directory.

So after I understood Win7 = UTF-16-, Unix/BSD/Linux...=UTF-8-filenames, I want to rename the files.
Of course I don't want to rename all 200 files by hand.
So I tried:
convmv -f UTF-16 -t UTF-8 --notest on_a_file_to_test
That had no effect on cron's error message, "Can't translate pathname..." still stays.
Also trying to be sure
convmv -f UTF-8 -t UTF-8 --nfc --notest on_the-same-testfile
gives me: "Ready! I converted 0 files in 0 seconds." So, nothing was done. File is already correctly converted to UTF-8, right?
Cron's error messages mails stay.

It doesn't matter, if I put the destination on my local machine or directly to my little NAS (both FreeBSD with ZFS utf8only off), or change the shebang (what indeed is pointless anyway, I just try and tell you to see I've tried several things, before I bother to ask.)

So the only difference I figured out yet, what produces me the error messages or not, is if I do it in cron or not - but this can't be, can it?
As far as I understand the system, cron just starts the commands/scripts only, just as I would do it in the shell - or not?
Or does it receives more messages from a task as me if I do the same in the shell? (You see, again I don't see the forest because of the trees anymore.)

What do I not see/understand?

Thanks in advance for your time and answers!

yours

Profighost
 
What about this? Did you try it?


Basically, cron does not use the same locale settings as a regular user does.
 
Check this:

 
Basically, cron does not use the same locale settings as a regular user does.
That's a very important information to me. I didn't know this. *lightbulp*
Thank you!!
So, cron is not just a "timer" that starts a user process for him like the user would do it by himself, but does the job instead of him.

So, I first will need to figure out, how to get those settings... - Ah! (just RTFM: https://www.freebsd.org/doc/handbook/configtuning-cron.html)
"Tip:
Before using a custom script, [...] test it with the limited set of environment variables set by cron.
"
Yeah, I strongly can recommend this! :)))
(It's so f... simple and obvious, when you see it...[that's what I ment with trees and forest ;-])

So, as far as I understood the source of my puzzling is:
All I did and tested was in my standard tcsh, but cron executes it in the sh.
Even if the shebang of my script says sh it's the same as I open a sh from my tcsh, I am in a subshell having the same settings from the shell 'above' - right?
That explains that my script runs without messages when I start it by hand and both locales show me the same settings. But cron starts the job within a sh directly, not being a subshell.
So the solution is either to change cron's SHELL to tcsh (at least this solved my prob) or - better(?) - change the locale settings of sh to be the same as the tcsh.

(I think the difference of locale settings of different shells and not really understanding the concept of subshells yet are the cause of some grey hair I've grown in the last months...)

I think I got that now.
Thank you all guys!


Would it be a good idea to have all shells on the system having the same locale settings - at least the ones used - or could this cause other problems, like other demons then may act not correctly anymore?
 
So, as far as I understood the source of my puzzling is:
All I did and tested was in my standard tcsh, but cron executes it in the sh.
Even if the shebang of my script says sh it's the same as I open a sh from my tcsh, I am in a subshell having the same settings from the shell 'above' - right?
That explains that my script runs without messages when I start it by hand and both locales show me the same settings. But cron starts the job within a sh directly, not being a subshell.

It's not only about the shell used, but about the environment variables.

Would it be a good idea to have all shells on the system having the same locale settings - at least the ones used - or could this cause other problems, like other demons then may act not correct anymore?
You don't need to set locale for every shell on the system, it may even be better not to touch these settings at all. Just set the locale in your backup/mailer script and you're done.
 
Got it.

I agree. Handle general settings with great care (therefor my question).

So I added

#!/bin/sh

export LC_CTYPE="de_DE.UTF-8"
export LC_COLLATE="de_DE.UTF-8"
export LC_TIME="de_DE.UTF-8"
export LC_NUMERIC="de_DE.UTF-8"
export LC_MONETARY="de_DE.UTF-8"
export LC_MESSAGES="de_DE.UTF-8"
export LC_ALL=de_DE.UTF-8

to my script (for #!/bin/tcsh it would be setenv instead of export, right?), and it also works. It's indeed more elegant than to change the cron's environment.
Did I understood the concept of shellscipting right, if I say that this would only change/set the locale setting variables for the script, but not change them generally as long as they are reset by a reboot?
 
export LC_CTYPE="de_DE.UTF-8"
export LC_COLLATE="de_DE.UTF-8"
export LC_TIME="de_DE.UTF-8"
export LC_NUMERIC="de_DE.UTF-8"
export LC_MONETARY="de_DE.UTF-8"
export LC_MESSAGES="de_DE.UTF-8"
You don't need to set all those, export LC_ALL=de_DE.UTF-8 would have been enough.

From:

Code:
LC_ALL
        This variable shall determine the values for all locale categories. The value of the LC_ALL environment variable
        has precedence over any of the other environment variables starting with LC_ (LC_COLLATE, LC_CTYPE,
        LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME) and the LANG environment variable.
 
Ah, okay. Thank you. Also for the link.
 
Back
Top