Is there an easy way to strip HTML from a QString in Qt?

拟墨画扇 提交于 2019-12-22 01:13:10

问题


I have a QString with some HTML in it... is there an easy way to strip the HTML from it? I basically want just the actual text content.

<i>Test:</i><img src="blah.png" /><br> A test case

Would become:

Test: A test case

I'm curious to know if Qt has a string function or utility for this.


回答1:


You may try to iterate through the string using QXmlStreamReader class and extract all text (if you HTML string is guarantied to be well formed XML).

Something like this:

QXmlStreamReader xml(htmlString);
QString textString;
while (!xml.atEnd()) {
    if ( xml.readNext() == QXmlStreamReader::Characters ) {
        textString += xml.text();
    }
}

but I'm unsure that its 100% valid ussage of QXmlStreamReader API since I've used it quite longe time ago and may forget something.




回答2:


QString s = "<i>Test:</i><img src=\"blah.png\" /><br> A test case";
s.remove(QRegExp("<[^>]*>"));
// s == "Test: A test case"



回答3:


If you don't care about performance that much then QTextDocument does a pretty good job of converting HTML to plain text.

QTextDocument doc;
doc.setHtml( htmlString );

return doc.toPlainText();

I know this question is old, but I was looking for a quick and dirty way to handle incorrect HTML. The XML parser wasn't giving good results.




回答4:


the situation that some html is not quite validate xml make it worse to work it out correctly.

If it's valid xml (or not too bad formated), I think QXmlStreamReader + QXmlStreamEntityResolver might not be bad idea.

Sample code in: https://github.com/ycheng/misccode/blob/master/qt_html_parse/utils.cpp

(this can be a comment, but I still don't have permission to do so)




回答5:


this answer is for who read this post later and using Qt5 or later. simply escape the html characters using inbuilt functions as below.

QString str="<h1>some hedding </h1>"; // a string containing html tags.
QString esc=str.toHtmlEscaped(); //esc contains the html escaped srring.


来源:https://stackoverflow.com/questions/2799379/is-there-an-easy-way-to-strip-html-from-a-qstring-in-qt

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!