Behavior of iswprint and wcswidth with locale fr_FR.ISO8859-1

Hi,

I have isolated some of the code from newsbeuter which has an unexpected behavior in the context where the locale is fr_FR.ISO8859-1. Everything goes well if the locale is fr_FR.UTF-8. The behavior is different with the iswprint () and wcswidth () functions for a character <<`>> or L '\ uFFFD' respectively.
To whom should I address myself? Do I post my code here?

Thanks in advance for any response.
Best regards.
 
Can not upload a file, so I copy directly:
============8<-----------------------------
Code:
#include <langinfo.h>
#include <locale.h>
#include <stfl.h>
#include <string>
#include <iostream>
#include <fstream>
//#include <wctype.h>

// From: utils::clean_nonprintable_characters
std::wstring clean_nonprintable_characters(std::wstring text) {
  for (size_t idx=0; idx<text.size(); ++idx) {
    if (!iswprint(text[idx]))
      text[idx] = L'\uFFFD';
  }
  return text;
}

// From: utils::str2wstr
std::wstring str2wstr(const std::string& str) {
  const char * codeset = nl_langinfo(CODESET);
  struct stfl_ipool * ipool = stfl_ipool_create(codeset);
  std::wstring result = stfl_ipool_towc(ipool, str.c_str());
  stfl_ipool_destroy(ipool);
   std::wcout << "str2wstr(): " << result << std::endl;
  return result;
}

// From: utils::strwidth_stfl
size_t wcswidth_stfl(const std::wstring& str, size_t size) {
  size_t reduce_count = 0;
  size_t len = std::min(str.length(), size);
  if (len > 1) {
    for (size_t idx=0; idx<len-1; ++idx) {
      if (str[idx] == L'<' && str[idx+1] != L'>') {
        reduce_count += 3;
        idx += 3;
      }
    }
  }

  //LOG(LOG_DEBUG, "utils::wcswidth_stfl: size=%d, len=%d, reduce_count=%d", size, len, reduce_count);

  int width = wcswidth(str.c_str(), size);
  if (width < 0) {
    //LOG(LOG_ERROR, "oh, oh, wcswidth just failed"); // : %ls", str.c_str());
    std::cout << "oh, oh, wcswidth just failed" << std::endl;
       return str.length() - reduce_count;
  }

  return width - reduce_count;
}


int main()
{
   std::string text;

   setlocale(LC_ALL, "fr_FR.ISO8859-1");
   //setlocale(LC_ALL, "fr_FR.UTF-8");

   std::cout << "nl_langinfo(CODESET): " << nl_langinfo(CODESET) << std::endl;

   std::fstream ifs("title.txt");
   getline( ifs, text);
   std::cout << text << std::endl;

   // ---------------------------------------------
   // From: listformatter::add_line()
  std::wstring mytext = clean_nonprintable_characters(str2wstr(text));
   std::wcout << mytext << std::endl;
   unsigned int width = text.length();

   std::cout << "width=" << width << ", mytext.length()=" << mytext.length() << std::endl;

  while (mytext.length() > 0) {
    size_t size = mytext.length();
    size_t w = wcswidth_stfl(mytext, size);
       std::cout << "w=" << w << std::endl;
    //if (w > width) {
    //  while (size && (w = wcswidth_stfl(mytext, size)) > width) {
    //    size--;
    //  }
    //}
    //lines.push_back(line_id_pair(utils::wstr2str(mytext.substr(0, size)), id));
    mytext.erase(0, size);
  }
   std::wcout << mytext << L" (apres traitement)" << std::endl;

   getchar();
   return 0;
}
============8<-------- title.txt contains ---------------------
Titre: "Aujourd´hui, nos concurrents ne sont plus chinois ou brésiliens, ils sont Européens !", rappelle le patron du syndicat de l´industrie agroalimentaire
 
Do you have a shorter snippet?

Code:
~/test % clang++ a.cc
a.cc:3:10: fatal error: 'stfl.h' file not found
#include <stfl.h>
         ^~~~~~~~
1 error generated.
~/test %
 
Code:
% pkg which /usr/local/include/stfl.h
/usr/local/include/stfl.h was installed by package stfl-0.24

I use a Makefile which contains:
Code:
LDFLAGS=$(LIBS) -pthread -lstfl
 
Back
Top