I am writing a web crawler to fetch some Chinese web files. The fetched files are encoded in utf-8. And I need to read those file to do some parse, such as extracting the UR
In general, use the w
variants, (wstring
, wfstream
, wcout
), set your locales to match the requirements, hang an L
on the front of string literals. locale::global(locale(""))
sets up to match the environment default, then on each stream that isn't running according to that default e.g. wcout.imbue(locale("Chinese_China.936"))
might be Microsoft's name for your terminal's locale settings. This has always been enough to do what I want, hope it works as well for you.
#include
#include
using namespace std;
int main() {
locale::global(locale(""));
wstring word;
while (wcin >>word)
wcout<