Solved Standard alphabetical order - how?

  • Thread starter Deleted member 9563
  • Start date
D

Deleted member 9563

Guest
I would like to see files sorted in standard alphabetical order. As it stands, I have a non-standard order in which all upper-case letters come before lower-case letters. The result is "Desk" coming before "arc". The problem appears to be system wide. How can I fix this?
 
Firstly, being pedantic, the default behaviour with caps before lowercase is the standard behaviour. For better or worse, the default Unix behaviour is to sort based on the numeric character values, and it is correct for FreeBSD to continue to use that default. Ignoring case in the ordering is non-standard.

The standard way of setting up locales and charsets is now to use login.conf(5) either in /etc/login.conf or ~/.login_conf. The default /etc/login.conf ships with an example for setting up a Russian locale, and /usr/share/skel/dot.login_conf has an example for German. The relevant capability names are charset and lang.

Also, note that en_US is technically not correct, it should include the charset as well, so one of the following (the correct choice is the one which matches the configuration of your terminal window / device):
  • en_US.ISO8859-1
  • en_US.ISO8859-15
  • en_US.US-ASCII
  • en_US.UTF-8
Use locale -a to get a list of valid language settings, and locale -m to get a list of valid charsets.

Lastly, LANG is the better choice of environment variable, and the standard way of setting your preference via that method (if you prefer not to use login.conf for some reason), rather than LC_ALL (which is an override). See environ(7).
 
Thank you for your quick response gpatrick and Murph.

I've got
Code:
LC_ALL=en_US.UTF-8
in my /etc/login.conf. Even after running
Code:
cap_mkdb /etc/login.conf
I'm not seeing any difference.

Firstly, being pedantic, the default behaviour with caps before lowercase is the standard behaviour. For better or worse, the default Unix behaviour is to sort based on the numeric character values, and it is correct for FreeBSD to continue to use that default. Ignoring case in the ordering is non-standard.

Point taken. :) I should have clarified that by "alphabetical" order I was referring to the English standard and not the UNIX standard. I recognize that I'm using *NIX but in this particular case I'm using it as a tool to manipulate English where, as you will see by any authoritative reference such as the OED, that "geometry" comes before "Georgian".

My concern is that when I have a long listing, grouping upper-case and lower-case separately causes unneeded difficulty in locating what I want. Can you imagine going to a book library and having to go to different stacks for upper and lower case!
 
Thank you for your quick response gpatrick and Murph.

I've got
Code:
LC_ALL=en_US.UTF-8
in my /etc/login.conf. Even after running
Code:
cap_mkdb /etc/login.conf
I'm not seeing any difference.



Point taken. :) I should have clarified that by "alphabetical" order I was referring to the English standard and not the UNIX standard. I recognize that I'm using *NIX but in this particular case I'm using it as a tool to manipulate English where, as you will see by any authoritative reference such as the OED, that "geometry" comes before "Georgian".

My concern is that when I have a long listing, grouping upper-case and lower-case separately causes unneeded difficulty in locating what I want. Can you imagine going to a book library and having to go to different stacks for upper and lower case!

Well, I've done a little digging, as this issue got my curiosity. On both FreeBSD and OS X, LC_COLLATE is more or less the same for all variants of en_GB and en_US, and essentially follows the traditional Unix and POSIX ordering (which is based on the ordering of US-ASCII character codes). The high bit characters (128–255) vary depending on the encoding, but the low range is the same for all.

I can, however, offer you a couple of commands which may help, and could be configured as a shell alias for convenience. ls -f | sort -f | column for a basic listing, and ls -lf | sort -f -k9 for a long listing.

You could also create a custom LC_COLLATE using colldef(1), but that is left as a moderately complex task for the reader. Beware of possibly odd behaviour from some things (not thinking of anything in particular, just you are into relatively unexplored territory) with a custom locale, and I strongly recommend against having such a thing configured for root or system daemons. If doing such a thing, it is possibly best not to replace anything in the standard locales, but to clone one of them under a new name and add the custom collation there; but on the other hand most system stuff should be using C or POSIX anyway.
 
Murph I think you have given me as good an answer as I'm going to get, and indeed your offered commands will work for my purposes. I'll put them in my alias list.

I too have done a lot of searching about this and I see no one offering a solution involving locale. Interestingly, I also run a machine with 16 bit DOS and note that the sort algorithm used there does actually use the English order (A a, B b, etc.). It would seem that a little basic code, though far beyond my skills, could be useful for those using FreeBSD to write and organize English text. In any case, I'll consider this solved for my purposes.

I can, however, offer you a couple of commands which may help, and could be configured as a shell alias for convenience. ls -f | sort -f | column for a basic listing, and ls -lf | sort -f -k9 for a long listing.
 
Sorting with LC_COLLATE=en_US.UTF-8 is broken not just with ls, but even with gnuls under FreeBSD, but it works fine in Linux. Somebody please fix this!
 
Sorting with LC_COLLATE=en_US.UTF-8 is broken not just with ls, but even with gnuls under FreeBSD, but it works fine in Linux. Somebody please fix this!

Well, from what I have been able to find, The Open Group only defines the sort order for the C and POSIX locales, which is to strictly follow the ASCII ordering, and that is what we currently have for en_*. Linux may actually be the broken case here, unless you can point at a UNIX standard which FreeBSD is not following. In the absence of an official specification for it, it makes good sense to me that latin locales use the POSIX locale as a baseline.

Linux not exhibiting the same behaviour, on its own, is most certainly not sufficient evidence that FreeBSD is doing something wrong, as FreeBSD has never set out to reproduce the behaviour of Linux (quite the opposite, really; Linux often (but not always) tries to reproduce the behaviour of UNIX).
 
  • Thanks
Reactions: fnj
I'm sure that many people only use FreeBSD for C and related work, however there must be a fair proportion who use it for English related tasks such as file management. In such a case the ASCII ordering referenced here is contrary to any historical usage of characters by humans for cataloging. ASCII is indeed an old and important standard but it is designed for machine use.

. . . Linux not exhibiting the same behaviour, on its own, is most certainly not sufficient evidence that FreeBSD is doing something wrong, as FreeBSD has never set out to reproduce the behaviour of Linux (quite the opposite, really; Linux often (but not always) tries to reproduce the behaviour of UNIX).

Good point. I note too that the early designers of MS-DOS were also close to the people developing XENIX and had the good sense to copy some of the brilliant ideas like piping. They did, however, manage to make the sort order match the needs of those using the system for writing and file management.
 
Well, from what I have been able to find, The Open Group only defines the sort order for the C and POSIX locales, which is to strictly follow the ASCII ordering, and that is what we currently have for en_*. Linux may actually be the broken case here, unless you can point at a UNIX standard which FreeBSD is not following. In the absence of an official specification for it, it makes good sense to me that latin locales use the POSIX locale as a baseline.

Linux not exhibiting the same behaviour, on its own, is most certainly not sufficient evidence that FreeBSD is doing something wrong, as FreeBSD has never set out to reproduce the behaviour of Linux (quite the opposite, really; Linux often (but not always) tries to reproduce the behaviour of UNIX).

Your point about The Open Group only specifying behavior for the C and POSIX locales is well taken. If locales are going to be supported at all, then surely some standard governing them needs to apply. I believe that standard could only be ISO 14651: "Information technology - International string ordering and comparison - Method for comparing character strings and description of the common template tailorable ordering".

I made the following test program s.c:
Code:
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
  int r;
  const char *s1 = "BCD";
  const char *s2 = "abc";
  setlocale(LC_COLLATE, getenv("L"));
  r = strcoll(s1, s2);
  printf("strcoll('%s', '%s') returns %d\n", s1, s2, r);
}

And ran it in both FreeBSD and linux, invoking it as both "./s" and "L=en_US.UTF-8 ./s" in turn on each system.

FreeBSD prints "-31" for both invocations. Linux prints "-31" for the first invocation, and "1" for the second. My conclusion is that there is a problem with either the locale definition, or with libc, on one of the systems. Surely, the locale "en_US.UTF-8" should not behave differently on different systems.

"Man strcoll 3" on FreeBSD says "... lexicographically compares ... according to the current locale collation".

The question is, where is the fault?
 
Back
Top