locale issues

SingerMan · Apr 30, 2020

I'm trying to extract from a tarball. (I don’t need help with that.)

But I get this nonsense:

Code:

Pathname can’t be converted from UTF-8 to current locale

When I enter the command locale the LANG= line is empty.
Hoo boy

Yup! everything that's easy in any version of linux is like being in the monkey house with freebsd.

D-FENS · May 1, 2020

This means that a locale is not set for your shell, so some characters might be broken when extracting from the tar (they cannot be represented in the shell runtime).
You could set the LANG variable to an UTF-8 locale, for example "de_DE.UTF-8" or whichever language you use. This would enable untar to encode all filenames properly. Putting it in your ~/.bashrc or another login script can set the variable automatically when you open a shell.
A list of all locales can be retrieved with locale -a.
And this section of the handbook has more details about locales: https://www.freebsd.org/doc/handbook/using-localization.html

bsdcode · May 1, 2020

So the OP configures his Linux systems with the proper locale and omits this step when configuring his FreeBSD systems and then complains that locales do not work in FreeBSD and FreeBSD is like being in a monkey house... Come on.

D-FENS · May 1, 2020

Many GNU distros have quite heavily preconfigured shells, which is probably the case here.
But that's one of the strengths of non-bloated systems like FreeBSD. See here:

Bash:

# ps -ef | wc -l
      14   # FreeBSD
      924  # Gentoo
      281  # Arch

I can write my own (sane) init scripts, thank you very much.

SingerMan · May 1, 2020

ok "steps" your a cool guy never mind

@ roccobaroccoS what is the command or where would I enter enter LANG=en_US and charset=UTF-8 thanks

D-FENS · May 1, 2020

Put this in your shell's startup script (if you use bash, it's ~/.bashrc):
export LANG=en_US.UTF-8

memreflect · May 1, 2020

SingerMan said:
where would I enter LANG=en_US and charset=UTF-8

~/.login_conf for only your user or /etc/login.conf if you want it to apply system-wide. Either way, don't forget to use cap_mkdb(1) to ensure the changes are recognized on the next login. For example:

Code:

cap_mkdb ~/.login.conf

You can also set the environment variables LANG and MM_CHARSET in your shell profile (~/.profile for sh, ~/.login for csh) or other configuration file (e.g. ~/.cshrc), but if you ever change your login shell, you'll need to set them again for that shell. With ~/.login_conf, it applies to any login shell for your user.

See the man page for more info on login.conf(5).

ralphbsz · May 1, 2020

Where does a Linux install get that information from? And how does it deal with the fact that locale is a user-specific setting, and multiple users on the same computer may very well have different wishes? Sometimes, a single (human) user can have multiple wishes?

While the FreeBSD way of leaving this to each users is a bit more work and a bit more pain, it is also more correct.

Many years ago, a good friend of mine (who is German, lives in Germany, and sets their preferred character set to iso8859-1 for German characters) was working for a Ukrainian bank as a consultant (using some other iso8859-x setting, or perhaps KOI-something), but he was dating a Russian person (so e-mails to the partner were in a window with a different encoding). They typically had three different locales in different windows on the screen at the same time. At least unicode has gotten rid of having to install multiple fonts.

SingerMan · May 1, 2020

Thanks to roccobaroccoS, and to memreflect for your even fuller reply. That indeed worked!

I have actually been making use of a FreeBSD backup server that I put together about three years ago. I suppose I got lucky when I installed it. (I surely have not spent much time “learning” FreeBSD; I been more than busy with Graduate studies.) Linux has been my daily driver.

Yesterday I added a new 6TB drive, and ‘installed’ the new 12.1 UEFI version. Clearly, I rushed through it all and somehow incorrectly answered or missed the locale question. (In between doing errands with a mask and gloves -- and getting a bit cranky.) But, excelsior! Again, thank you both.

It actually pleases me to no end knowing I’m using the descendant an OS that was written while Mr. Gates was still in short pants! (I intend to spend more time learning FreeBSD henceforth.)

memreflect · May 1, 2020

ralphbsz said:
Many years ago, a good friend of mine (who is German, lives in Germany, and sets their preferred character set to iso8859-1 for German characters) was working for a Ukrainian bank as a consultant (using some other iso8859-x setting, or perhaps KOI-something), but he was dating a Russian person (so e-mails to the partner were in a window with a different encoding). They typically had three different locales in different windows on the screen at the same time. At least unicode has gotten rid of having to install multiple fonts.

The state of affairs has improved in that the Unicode character encoding does not limit one to a single "legacy" encoding in a source document or require embedding control characters in what is typically thought of as plain text to make use of multiple encodings in a piece of text, some of those encodings possibly being missing on a system. That's the part that the Unicode character encodings such as UTF-8 solve.

However, you still need fonts that support the necessary glyphs. Just because a Unicode character encoding is used, that does not mean you can display the glyphs making up the characters (and some Greek characters require combining diacritics, effectively being made up of multiple Unicode code points). The Windows "Command Prompt" application is Unicode-enabled already, but there are still plenty of questions about how to enable Unicode for it in C or C++ because the glyphs don't render properly. Linux has the same issue despite being aware of UTF-8.

To someone used to the Latin alphabet almost exclusively, here in 2020, the idea that we haven't already found a solution to these problems may seem crazy. In the end, we actually have, but it requires a graphical session to take full advantage of Unicode, assuming you have the necessary fonts installed. Well, that and you can't use the Windows Command Prompt if you're using Windows...

memreflect · May 1, 2020

SingerMan said:
(I intend to spend more time learning FreeBSD henceforth.)

Good to hear. FreeBSD has its pros and cons when you compare it with Linux, but the pros outweigh the cons for my uses; perhaps your viewpoint will differ as you have different requirements of a "daily driver" than I do. Some concepts may be shared due to common software in use or simply their shared heritage, but they are still two very different operating systems.

bsdcode · May 1, 2020

SingerMan said:
ok "steps" your a cool guy never mind

I want to apologize for my snarky reply. The FreeBSD community is very helpful and very kind and I want to be part of it. I failed miserably at that with my first reply to you. I promise I will become better.

D-FENS · May 1, 2020

SingerMan said:
Thanks to roccobaroccoS, and to memreflect for your even fuller reply. That indeed worked!

I have actually been making use of a FreeBSD backup server that I put together about three years ago. I suppose I got lucky when I installed it. (I surely have not spent much time “learning” FreeBSD; I been more than busy with Graduate studies.) Linux has been my daily driver.

Yesterday I added a new 6TB drive, and ‘installed’ the new 12.1 UEFI version. Clearly, I rushed through it all and somehow incorrectly answered or missed the locale question. (In between doing errands with a mask and gloves -- and getting a bit cranky.) But, excelsior! Again, thank you both.

It actually pleases me to no end knowing I’m using the descendant an OS that was written while Mr. Gates was still in short pants! (I intend to spend more time learning FreeBSD henceforth.)

In case you use the system ocasionally it's very important to write down everything you learn and have troubles with. Next time when you groom the server, have your notes at hand.

BTW. I think Bill still uses shorts from time to time

haha just joking.

Mjölnir · Jul 2, 2020

Hi all!
I have some related issue with locale(1), so I'm picking up this thread. I have in my ~/.login_conf:

Code:

me:\
        :charset=UTF-8:\
        :lang=de_DE.UTF-8:

and in KDE systemsettings I have set LC_NUMERIC to get the modern format with a space every thousands: "1 234 567, 89".

Code:

$ env|egrep '(LANG|CHAR|LC)'
LANG=de_DE.UTF-8
LANGUAGE=de:en_US
LC_NUMERIC=ksh_DE.UTF-8
MM_CHARSET=UTF-8
$ locale
LANG=de_DE.UTF-8
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
$ locale charmap
US-ASCII

Consequently, some CLI programs do not use UTF-8 charset, but US-ASCII instead. What's going wrong?

Mjölnir · Jul 2, 2020

OK just for the record: I did not try on the console, but only in a terminal window (KDE's x11/konsole). This seems to be a bug in my GUI KDE. The above on the real console gives
de_DE.UTF-8 for all LC_* and

Code:

$ locale charmap
UTF-8

So I'll close my bug report upstream on security/gpg and open one on KDE.