C/C++ Low level str to uppercase?

yuripv

Member

Thanks: 26
Messages: 84

#51
So, what's going to happen with ß when used through that API?
We have the same problem in Eastern Armenian: և becomes ԵՎ when upper-cased.
Looks like it's the following entry in UnicodeData.txt (which we will hopefully use as a source for our utf-8 ctype maps soon):
Code:
0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L;<compat> 0565 0582;;;;N;;;;;
So there's no *simple* upper/lower mapping, which is used by towupper/towlower.

Same for ß:
Code:
00DF;LATIN SMALL LETTER SHARP S;Ll;0;L;;;;;N;;;;;
 

aragats

Aspiring Daemon

Thanks: 361
Messages: 920

#52
There is a separate file for special cases!
Code:
The data in this file, combined with the simple case mappings in UnicodeData.txt, defines the full case mappings
Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc).
 

yuripv

Member

Thanks: 26
Messages: 84

#53
There is a separate file for special cases!
Code:
The data in this file, combined with the simple case mappings in UnicodeData.txt, defines the full case mappings
Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc).
Indeed, but as olli@ mentioned, towupper/towlower can only return single character (POSIX description), so we have to use simple mappings there. I guess there are libraries (ICU?) already providing the means for proper case conversion, but I didn't really look into it.
 

ralphbsz

Daemon

Thanks: 736
Messages: 1,257

#54
Let me attempt to summarize this discussion: Uppercasing a string is not always the same as uppercasing a single character. To uppercase a string, you have to do more than just uppercase every character in the string.

From this I conclude that I never ever want to work on a project that requires i18n; and if I have to, I'll have to buy lots of alcohol.
 

ikbendeman

Well-Known Member

Thanks: 17
Messages: 355

#55
Use:
ASCII Table
Then a do <add or subtract> while <within bounds of A-Z or a-z> {Subtract/Add}
It's up to you how you'd want to parse it but you could use an array and do the usual, as well as do the operation during the parsing loop.
 
Top