How to remove accents / diacritic marks from a string in Qt?

可紊 提交于 2019-12-01 18:00:35

问题


How to remove diacritic marks from a string in Qt. For example, this:

QString test = QString::fromUtf8("éçàÖœ");
qDebug() << StringUtil::removeAccents(test);

should output:

ecaOoe

回答1:


There is not straighforward, built-in solution in Qt. A simple solution, which should work in most cases, is to loop through the string and replace each character by their equivalent:

QString StringUtil::diacriticLetters_;
QStringList StringUtil::noDiacriticLetters_;

QString StringUtil::removeAccents(QString s) {
    if (diacriticLetters_.isEmpty()) {
        diacriticLetters_ = QString::fromUtf8("ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ");
        noDiacriticLetters_ << "S"<<"OE"<<"Z"<<"s"<<"oe"<<"z"<<"Y"<<"Y"<<"u"<<"A"<<"A"<<"A"<<"A"<<"A"<<"A"<<"AE"<<"C"<<"E"<<"E"<<"E"<<"E"<<"I"<<"I"<<"I"<<"I"<<"D"<<"N"<<"O"<<"O"<<"O"<<"O"<<"O"<<"O"<<"U"<<"U"<<"U"<<"U"<<"Y"<<"s"<<"a"<<"a"<<"a"<<"a"<<"a"<<"a"<<"ae"<<"c"<<"e"<<"e"<<"e"<<"e"<<"i"<<"i"<<"i"<<"i"<<"o"<<"n"<<"o"<<"o"<<"o"<<"o"<<"o"<<"o"<<"u"<<"u"<<"u"<<"u"<<"y"<<"y";
    }

    QString output = "";
    for (int i = 0; i < s.length(); i++) {
        QChar c = s[i];
        int dIndex = diacriticLetters_.indexOf(c);
        if (dIndex < 0) {
            output.append(c);
        } else {
            QString replacement = noDiacriticLetters_[dIndex];
            output.append(replacement);
        }
    }

    return output;
}

Note that noDiacriticLetters_ needs to be a QStringList since some characters with diacritic marks can match to two single characters. For example œ => oe




回答2:


Your question is a bit misleading. You seem to want to do more than merely remove diacritical marks (œ is a ligature letter without diacritics). I guess you want to turn any Unicode string into a roughly corresponding ASCII string?

For diacritics, you could perform a decomposing Unicode normalization (NFD or NFKD, depending on your specific needs) and then remove all characters of the "Mark" categories (QChar::Mark_NonSpacing, QChar::Mark_SpacingCombining and QChar::Mark_Enclosing).

For everything else (e.g. œ), I don't know of a generic solution. Create a look-up table with all your desired replacements and then search and replace (see Laurent's answer).




回答3:


A partial solution is to use QString::normalized, than remove the special characters.

QString test = QString::fromUtf8("éçàÖœ");
QString stringNormalized = test.normalized (QString::NormalizationForm_KD);
stringNormalized.remove(QRegExp("[^a-zA-Z\\s]"));

This is however a partial solution because it will not convert "œ" into "oe".




回答4:


There is crude way to partial solve your problem (just accents, but not ligatures such as "oe").

QString title=QString::fromUtf8("éçàÖ");
qDebug("%s\n", title.toLocal8Bit().data());


来源:https://stackoverflow.com/questions/14009522/how-to-remove-accents-diacritic-marks-from-a-string-in-qt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!