Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8?

I would like to use UTF8 internally, but am having trouble getting the correct output (via wcout after another conversion or plain cout). Normal ASCII characters work of course, but ?? gets messed up.

Or is there an easier alternative?

Thanks!

UPDATE: Thanks to Hans (below), I now have an easy UTF8<->UTF16 conversion through the Windows API. Two way conversion works, but the UTF8 from UTF16 string has some extra characters that might cause me some trouble later on...). I'll share it here out of pure friendliness :) ):

// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
    // get length
    int length = WideCharToMultiByte( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0,
                                      NULL, NULL );
    if( !(length > 0) )
        return std::string();
    else
    {
        std::string result;
        result.resize( length );

        if( WideCharToMultiByte( CP_UTF8, NULL,
                                 input.c_str(), input.size(),
                                 &result[0], result.size(),
                                 NULL, NULL ) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
    }
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
    // get length
    int length = MultiByteToWideChar( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0 );
    if( !(length > 0) )
        return std::wstring();
    else
    {
        std::wstring result;
        result.resize( length );

        if( MultiByteToWideChar(CP_UTF8, NULL,
                                input.c_str(), input.size(),
                                &result[0], result.size()) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
146 views
Welcome To Ask or Share your Answers For Others

1 Answer

The Win32 API already has a function to do this, WideCharToMultiByte() with CodePage = CP_UTF8. Saves you from having to rely on another library.

You cannot normally use the result with wcout. Its output goes to the console, it uses an 8-bit OEM encoding for legacy reasons. You can change the code page with SetConsoleCP(), 65001 is the code page for UTF-8 (CP_UTF8).

Your next stumbling block would be the font that's used for the console. You'll have to change it but finding a font that's fixed-pitch and has a full set of glyphs to cover Unicode is going to be difficult. You'll see you have a font problem when you get square rectangles in the output. Question marks are encoding problems.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...