C Low level str to uppercase?

the cases that you mentioned cannot happen with the wide character API as defined by ISO C
So, what's going to happen with ß when used through that API?
We have the same problem in Eastern Armenian: և becomes ԵՎ when upper-cased.
 
So, what's going to happen with ß when used through that API?
We have the same problem in Eastern Armenian: և becomes ԵՎ when upper-cased.
Good question. I haven't actually tried it, but the manual page says (important part highlighted): “If the argument is a lower-case letter, the towupper(3) function returns the corresponding upper-case letter if there is one; otherwise the argument is returned unchanged.”
So, if the system supports the newest Unicode version that has the upper-case “ẞ”, that one will be returned. Otherwise, the lower-case “ß” is returned unchanged. There is no way it can return two characters (“SS”). I guess you'll have to use a third-party library if you need to perform conversions that can change the number of characters.
 
So, what's going to happen with ß when used through that API?
We have the same problem in Eastern Armenian: և becomes ԵՎ when upper-cased.

Looks like it's the following entry in UnicodeData.txt (which we will hopefully use as a source for our utf-8 ctype maps soon):
Code:
0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L;<compat> 0565 0582;;;;N;;;;;
So there's no *simple* upper/lower mapping, which is used by towupper/towlower.

Same for ß:
Code:
00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;;;;
 
There is a separate file for special cases!
Code:
The data in this file, combined with the simple case mappings in UnicodeData.txt, defines the full case mappings
Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc).
 
There is a separate file for special cases!
Code:
The data in this file, combined with the simple case mappings in UnicodeData.txt, defines the full case mappings
Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc).

Indeed, but as olli@ mentioned, towupper/towlower can only return single character (POSIX description), so we have to use simple mappings there. I guess there are libraries (ICU?) already providing the means for proper case conversion, but I didn't really look into it.
 
Let me attempt to summarize this discussion: Uppercasing a string is not always the same as uppercasing a single character. To uppercase a string, you have to do more than just uppercase every character in the string.

From this I conclude that I never ever want to work on a project that requires i18n; and if I have to, I'll have to buy lots of alcohol.
 
Use:
ASCII Table
Then a do <add or subtract> while <within bounds of A-Z or a-z> {Subtract/Add}
It's up to you how you'd want to parse it but you could use an array and do the usual, as well as do the operation during the parsing loop.
 
Back
Top