date display error in Spanish locale

Hi,

does anyone have any idea why date outputs corrupt words when I have my locale set to UTF-8 Spanish? I've set in /etc/profile:

Code:
LANG=es_ES.UTF-8
MM_CHARSET=UTF-8
LC_COLLATE=C
LC_TIME=es_ES.UTF-8
export LANG MM_CHARSET LC_COLLATE LC_TIME

And the output is wrong when any word contains a letter with an accent, ie:

Sábado displays as "sábado" and miércoles displays as "miércoles".

These corrupt words are included in mail sent from a system with this condition and this may be causing certain mail clients to display the mail date incorrectly,

thanks for any ideas, Andy.
 
The settings you have tells the date program to output the date in a string of 8-bit bytes, that should render into legible Spanish words, when rendered as UTF-8.

But is the terminal you are using this in really rendering UTF-8? For example, I could be stupid and create an xterm, and set its locale to be iso-8859-5 (for rendering text in Belarus or Ukraine). If I then run date which outputs UTF-8, then all non-ASCII characters will turn into nonsense.

Proposal for debugging this: run date | hexdump -C. This will show you the exact binary output from the command. Then grab a copy of the Unicode standard (wikipedia is great for that), and quickly read the UTF-8 encoding rules (they're pretty simple), and verify that date is really outputting UTF-8, and that the Spanish words it's outputting are correctly accented.

Second question: By changing these encodings in /etc/profile, you are potentially affecting a lot of software, for example mail clients. Are any other daemons affected? Are you sure that the particular mail clients are actually 8-bit and encoding clean? I have no idea. It might be a better idea to only use these settings for shell logins, not for daemons or servers (by putting them into users .profile instead of a system-wide file). You may want to experiment with that instead.
 
Hi Ralph,

thanks for your reply.

I'm using Putty for testing the shell, the same corrupt date output can be seen in Sendmail emails. I've really not sure wether Sendmail etc should work with 8-bit encoding so your suggestion about changing users .profile is probably a very good one.

I have checked the output via hexdump, it all looks good. I can see that "á" is outputting to "c3 a1". However as described previously, in an interactive shell and in Sendmail mail out put this instead displays as "á",

thanks, Andy.
 
AndyUKG said:
I have checked the output via hexdump, it all looks good. I can see that "á" is outputting to "c3 a1".
That's good. If you look up the UTF-8 encoding (wikipedia for UTF-8), you will see that code points up to 11 bits are encoded as 110xxxxx 10xxxxxx. Which exactly matches the bit pattern of c3 a1, which is 11000011 10100001. So the resulting character is (putting together all the x's from above) 00E1. Which is good: If you look at the Unicode table, character 00E1 is a lowercase a with the accent going to the right (exactly the character you have in your message, which I can't reproduce on my US keyboard). So we have verified that date (and therefore sendmail) are really outputting UTF-8.

However as described previously, in an interactive shell and in Sendmail mail out put this instead displays as "á",
Also makes sense, if you assume that the putty is running in iso-8859-1 (or some close relative, like -15). If you look at the code table (again, Wikipedia is your friend), the character 0xc3 is an uppercase A with a twiddle on top, and 0xa1 is an upside-down exclamation mark.

So the problem is understood: You are running your system in UTF-8, but then your display program (namely putty) is configured (should I say mis-configured?) to run in iso-8859-1 (or something similar).

Easiest thing to do: try to configure your putty to be running in UTF-8. Unfortunately, I don't have a windows machine with putty around any more, so I can't tell you how to configure it's display character set. I just tried it on my macintosh though (where the terminal emulation program of choice is called "iTerm", and the character set is in Preferences -> Profiles -> Terminal): If I configure it to display UTF-8, and execute the following python line: print chr(0xc1) + chr(0xa1), it prints a lowercase a with accent. If I then configure iTerm for "Western ISO Latin 1", then I get the uppercase A with twiddle plus upside-down exclamation mark.

So, step 1: Fix your terminal emulator to match your system.

Step 2: Think about whether putting non-ASCII characters into sendmail is a smart move. You are RELYING on all of your audience having their terminals configured for UTF-8. And I mean ALL of your audience. While this is a beautiful vision for the future (I would love for all encodings other than UTF-8 to vanish from the face of the earth, and I'd be happy to donate a few boxes of ammo and my shooting skills to further that goal), the world isn't ready for that yet.
 
Hi Ralph,

yes thanks, just by changing the settings to UTF-8 in Putty the date will display correctly.

The original issue I had was end users reporting that mails recieved from the system showed odd dates, ie like 1970, I'm not even 100% its due to this issue but as I didn't understand it thought I'd get this sorted first and see if that helps. Having had a bit of a search for info I've seen for example that thunderbird does not use UTF-8 by default so I suspect the issue may have been the result of a mail client recieving a mail with the date in the header in UTF-8 format.
I've completely removed the locale from the system now, so running the default locale for all users. I'll confirm if this has resolved the root issue,

thanks, Andy.
 
Hi, I'd just like to confirm that the end user has confrimed that mails are no longer display with an incorrect date after removing the locale config completely from the system (ie system is now using the default locale).
 
Back
Top