Because you mention WideCharToMultiByte I will assume you are dealing with Windows.
"read the content from an unicode file ... find a way to convert data to ASCII"
This might be a problem. If you convert Unicode to ASCII (or other legacy code page) you will run into the risk of corrupting/losing data.
Since you are "working on a unicode release build" you will want to read Unicode and stay Unicode.
So your final buffer will have to be wchar_t
(or WCHAR
, or CStringW
, same thing).
So your file might be utf-16, or utf-8 (utf-32 is quite rare).
For utf-16 the endianess might also matter. If there is a BOM that will help a lot.
Quick steps:
- open file with
wopen
, or _wfopen
as binary
- read the first bytes to identify encoding using the BOM
- if the encoding is utf-8, read in a byte array and convert to
wchar_t
with WideCharToMultiByte
and CP_UTF8
- if the encoding is utf-16be (big endian) read in a
wchar_t
array and _swab
- if the encoding is utf-16le (little endian) read in a
wchar_t
array and you are done
Also (if you use a newer Visual Studio), you might take advantage of an MS extension to _wfopen
. It can take an encoding as part of the mode (something like _wfopen(L"newfile.txt", L"rw, ccs=<encoding>");
with the encoding being UTF-8 or UTF-16LE). It can also detect the encoding based on the BOM.
Warning: to be cross-platform is problematic, wchar_t
can be 2 or 4 bytes, the conversion routines are not portable...
Useful links:
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…