Character sets in the console and in Samba

Dear All

I have thoroughly read through the Handbook chapter about Localization. My native language is Danish but in general, I have no problems using English for my machines. Apart from the fact, that one of my servers is supposed to be used for file sharing through Samba and web serving through Apache. Therefore, I need to use Danish national characters.

My current installation is based on FreeBSD 8.1-Prelease, samba34 and apache22.

A general question: should I use ISO-8859-1 or UTF-8 for the basic system?

Without any special settings, I can use my samba shares with Danish file names. But when I list the contents of the actual directories in the console, I either get question marks or strange characters - depending on my settings.

Since I am the only one having shell access, I usually use the root account.

The content of /etc/login.conf is (for the moment):
Code:
default:\
        :passwd_format=md5:\
        :copyright=/etc/COPYRIGHT:\
        :welcome=/etc/motd:\
        :setenv=MAIL=/var/mail/$,BLOCKSIZE=K,FTP_PASSIVE_MODE=YES:\
        :path=/sbin /bin /usr/sbin /usr/bin /usr/games /usr/local/sbin /usr/local/bin ~/bin:\
        :nologin=/var/run/nologin:\
        :cputime=unlimited:\
        :datasize=unlimited:\
        :stacksize=unlimited:\
        :memorylocked=unlimited:\
        :memoryuse=unlimited:\
        :filesize=unlimited:\
        :coredumpsize=unlimited:\
        :openfiles=unlimited:\
        :maxproc=unlimited:\
        :sbsize=unlimited:\
        :vmemoryuse=unlimited:\
        :swapuse=unlimited:\
        :pseudoterminals=unlimited:\
        :priority=0:\
        :ignoretime@:\
        :umask=022:

root:\
        :ignorenologin:\
        :tc=default:\
        :charset=ISO-8859-1:\
        :lang=da_DK.ISO8859-1:\
        :setenv=LC_TIME=da_DK.ISO8859-1:

www:\
        :charset=ISO-8859-1:\
        :lang=da_DK.ISO8859-1:\
        :setenv=LC_TIME=da_DK.ISO8859-1:
After editions of the file, I remember to run
# cap_mkdb /etc/login.conf
With these settings, I can read and write Danish characters in the console.

The samba set-up is working without any special attention to language settings. My /usr/local/etc/smb.conf looks like:
Code:
[global]
        workgroup = MYDOMAIN
        server string = server2
        map to guest = Bad Password
        passwd program = /usr/bin/passwd %u
        passwd chat = "Changing local password for*\nNew Password*" %n\n
        passwd chat debug = Yes
        unix password sync = Yes
        log level = 3
        log file = /var/log/samba34/log.%m
        max log size = 1000
        min receivefile size = 16384
        time server = Yes
        socket options = SO_RCVBUF=131072 SO_SNDBUF=131072 TCP_NODELAY
        add user script = /usr/sbin/pw useradd %u -g machines -s /sbin/nologin -h -d /tmp
        add machine script = /usr/sbin/pw useradd %u -d /var/empty -g machines -s /usr/sbin/nologin
        logon script = netlogon.bat
        logon path = ""
        logon drive = H:
        domain logons = Yes
        os level = 65
        domain master = Yes
        wins support = Yes
        idmap uid = 10000-20000
        idmap gid = 10000-20000
        winbind use default domain = Yes
        aio read size = 16384
        aio write size = 16384
[homes]
        comment = Home Directories
        read only = No
        browseable = No
        browsable = No
etc.

With these settings, users can save files with Danish names and see them the same way.

But when I use the console to list the content of user shares, Danish characters are (depending on the settings - I have tried several) either "strange" or simply question marks. For the most of the time, it doesn't matter much, since I can guess the names. But it can make it impossible to e.g. copy files to another directory.
If I copy files from a samba share to a web directory, the file names are also wrong in directory listings.

With the shown settings, I can create a text file from the console, but the file name will not be shown as anything else than a straight line in Windows Explorer.

I have tried to adjust the values of
Code:
display charset = 
unix charset = 
dos charset =
without any success.

Any help would be very much appreciated.

Regards,
Jon
 
Did you configure the syscons driver to use a font that is able to display those characters correctly?

You may do that by adding something like this to your /etc/rc.conf:
Code:
font8x8="iso-8x8.fnt"
font8x14="iso-8x14.fnt"
font8x16="iso-8x16.fnt"

Assuming iso-8859-1 contains the correct characters for you. Otherwise see /usr/share/syscons/fonts/ for other fonts, and syscons(4) of course.
 
mickey said:
Did you configure the syscons driver to use a font that is able to display those characters correctly?

You may do that by adding something like this to your /etc/rc.conf:
Code:
font8x8="iso-8x8.fnt"
font8x14="iso-8x14.fnt"
font8x16="iso-8x16.fnt"

Assuming iso-8859-1 contains the correct characters for you. Otherwise see /usr/share/syscons/fonts/ for other fonts, and syscons(4) of course.

Oh, sorry
I forgot to mention that part.
The relevant part of my /etc/rc.conf reads:
Code:
font8x8="iso-8x8"
font8x14="iso-8x14"
font8x16="iso-8x16"
keymap="danish.iso"
Does that look correct for the purpose?
As I mentioned, I am a bit confused about ISO vs. UFT-8.

Regards,
Jon
 
Then I guess you have to experiment with the settings of display charset and unix charset in your smb.conf again.
Try setting both to iso-8859-1:
Code:
unix charset = iso-8859-1
display charset = iso-8859-1
Then try to create a new file using danish characters from an SMB client, and see whether it is displayed correctly on your console. I would not however expect this to work on files that may have been created previously, using another character set.
 
mickey said:
Then I guess you have to experiment with the settings of display charset and unix charset in your smb.conf again.
Try setting both to iso-8859-1:
Code:
unix charset = iso-8859-1
display charset = iso-8859-1
Then try to create a new file using danish characters from an SMB client, and see whether it is displayed correctly on your console. I would not however expect this to work on files that may have been created previously, using another character set.
You are absolutely right. For at short moment, I was getting quite uncomfortable, since all the the Danish characters were wrong from the Windows client - until I remebered to set
Code:
dos charset = CP865
It works for new files. Not for old files.
Wonder if there is a workaround for it?

1) Use the old settings:
Code:
dos charset = CP865
unix charset = 
display charset =
2) Move all the files to a Windows drive
3) Use the new settings:
Code:
dos charset = CP865
unix charset = iso-8859-1
display charset = iso-8859-1
4) Move the files back again to the share

Not that I really understand the mechanics behind the encoding in FreeBSD/Samba.

Regards,
Jon
 
Sounds like this method could indeed work.

I presume the mechanics behind all that are, to make each of the different parts of software aware of the charset that should actually be used to store files on the host system. Otherwise misbehaviour is probably inevitable.

Samba should be able to convert the charset to whatever it's clients are using, but it still needs to know what charset to use locally, as it cannot guess that. Or guesses it wrong :)
 
It actually worked fine. Except for a few (<10 out of 17,000+) files, where Windows (or rather samba) complained about illegal filenames.
So by your help and my own suggested approach, I got around this issues.
Thank you very much.

Regards,
Jon
 
Back
Top