Cross-platform iteration of Unicode string (counting Graphemes using ICU)

余生长醉 提交于 2019-11-27 12:09:11

You should be able to use the ICU BreakIterator for this (the character instance assuming it is feature-equivalent to the Java version).

Glib's ustring class gives you utf-8 strings, if using utf-8 is ok for you. It is designed to be similar to std::string. Since utf-8 is native for Linux, your task is quite easy:

int main()
{
    Glib::ustring s = L"नमस्ते";
    cout << s.size();
}

you can also iterate on string's characters as usual with Glib::ustring::iterator

ICU has a very old interface, Boost.Locale is much better:

#include <iostream>
#include <string_view>

#include <boost/locale.hpp>

using namespace std::string_view_literals;

int main()
{
    boost::locale::generator gen;
    auto string = "noël 😸😾"sv;
    boost::locale::boundary::csegment_index map{
        boost::locale::boundary::character, std::begin(string),
        std::end(string), gen("")};
    for (const auto& i : map)
    {
        std::cout << i << '\n';
    }
}

Text is from here

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!