Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am using Visual Studio C++ 2008 (Express). When I run the below code, the wostream (both std::wcout, and std::wfstream) stops outputting at the first non-ASCII character (in this case Chinese) encountered. Plain ASCII characters print fine. However, in the debugger, I can see that the wstrings are in fact properly populated with Chinese characters, and the output << ... is in fact getting executed.

The project settings in the Visual Studio solution are set to "Use Unicode Character Set". Why is std::wostream failing to output Unicode characters outside of the ASCII range?

void PrintTable(const std::vector<std::vector<std::wstring>> &table, std::wostream& output) {
    for (unsigned int i=0; i < table.size(); ++i) {
        for (unsigned int j=0; j < table[i].size(); ++j) {
            output << table[i][j] << L"";
        }
        //output << std::endl;
    }
}


void TestUnicodeSingleTableChinesePronouns() {
    FileProcessor p("SingleTableChinesePronouns.docx");
    FileProcessor::iterator fileIterator;
    std::wofstream myFile("data.bin", std::ios::out | std::ios::binary);
    for(fileIterator = p.begin(); fileIterator != p.end(); ++fileIterator) {
        PrintTable(*fileIterator, myFile);
        PrintTable(*fileIterator, std::wcout);
        std::cout<<std::endl<<"---------------------------------------"<<std::endl;
    }
    myFile.flush();
    myFile.close();
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
506 views
Welcome To Ask or Share your Answers For Others

1 Answer

By default the locale that std::wcout and std::wofstream use for certain operations is the "C" locale, which is not required to support non-ascii characters (or any character outside C++'s basic character set). Change the locale to one that supports the characters you want to use.

The simplest thing to do on Windows is unfortunately to use legacy codepages, however you really should avoid that. Legacy codepages are bad news. Instead you should use Unicode, whether UTF-8, UTF-16, or whatever. Also you'll have to work around Windows' unfortunate console model that makes writing to the console very different from writing to other kinds of output streams. You might need to find or write your own output buffer that specifically handles the console (or maybe file a bug asking Microsoft to fix it).

Here's an example of console output:

#include <Windows.h>

#include <streambuf>
#include <iostream>

class Console_streambuf
    : public std::basic_streambuf<wchar_t>
{
    HANDLE m_out;
public:
    Console_streambuf(HANDLE out) : m_out(out) {}

    virtual int_type overflow(int_type c = traits_type::eof())
    {
        wchar_t wc = c;
        DWORD numberOfCharsWritten;
        BOOL res = WriteConsoleW(m_out, &wc, 1, &numberOfCharsWritten, NULL);
        (void)res;
        return 1;
    }
};

int main() {
    Console_streambuf out(GetStdHandle(STD_OUTPUT_HANDLE));
    auto old_buf = std::wcout.rdbuf(&out);
    std::wcout << L"привет, 猫咪!
";
    std::wcout.rdbuf(old_buf); // replace old buffer so that destruction can happen correctly. FIXME: use RAII to do this in an exception safe manner.
}

You can do UTF-8 output to a file like this (although I'm not sure VS2008 supports codecvt_utf8_utf16):

#include <codecvt>
#include <fstream>

int main() {
    std::wofstream myFile("data.bin", std::ios::out | std::ios::binary);
    myFile.imbue(std::locale(myFile.getloc(),new std::codecvt_utf8_utf16<wchar_t>));

    myFile << L"привет, 猫咪!";
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...