Is there an easy way to strip HTML from a QString in Qt?

笑着哭i 提交于 2019-12-04 22:18:54

You may try to iterate through the string using QXmlStreamReader class and extract all text (if you HTML string is guarantied to be well formed XML).

Something like this:

QXmlStreamReader xml(htmlString);
QString textString;
while (!xml.atEnd()) {
    if ( xml.readNext() == QXmlStreamReader::Characters ) {
        textString += xml.text();
    }
}

but I'm unsure that its 100% valid ussage of QXmlStreamReader API since I've used it quite longe time ago and may forget something.

QString s = "<i>Test:</i><img src=\"blah.png\" /><br> A test case";
s.remove(QRegExp("<[^>]*>"));
// s == "Test: A test case"

If you don't care about performance that much then QTextDocument does a pretty good job of converting HTML to plain text.

QTextDocument doc;
doc.setHtml( htmlString );

return doc.toPlainText();

I know this question is old, but I was looking for a quick and dirty way to handle incorrect HTML. The XML parser wasn't giving good results.

the situation that some html is not quite validate xml make it worse to work it out correctly.

If it's valid xml (or not too bad formated), I think QXmlStreamReader + QXmlStreamEntityResolver might not be bad idea.

Sample code in: https://github.com/ycheng/misccode/blob/master/qt_html_parse/utils.cpp

(this can be a comment, but I still don't have permission to do so)

this answer is for who read this post later and using Qt5 or later. simply escape the html characters using inbuilt functions as below.

QString str="<h1>some hedding </h1>"; // a string containing html tags.
QString esc=str.toHtmlEscaped(); //esc contains the html escaped srring.
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!