remove comments from html source code

前端未结

关注

 5  482

I know how to get the html source code via cUrl, but I want to remove the comments on the html document (I mean what is between ). In addition,

相关标签:

5条回答

情书的邮戳

2020-12-20 13:51

Try PHP DOM*:

$html = '<html><body><!--a comment--><div>some content</div></body></html>'; // put your cURL result here

$dom = new DOMDocument;
$dom->loadHtml($html);

$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
    $comment->parentNode->removeChild($comment);
}

$body = $xpath->query('//body')->item(0);
$newHtml = $body instanceof DOMNode ? $dom->saveXml($body) : 'something failed';

var_dump($newHtml);

Output:

string(36) "<body><div>some content</div></body>"

0 讨论(0)

小鲜肉

2020-12-20 13:53
I would pipe it to sed for a regex, something like
```
curl http://yoururl.com/test.html | sed -i "s/<!\-\-\s?\w+\s?\-\->//g" | sed "s/.?(<body>.?</body>).?/\1/"
```
The regexes may not be exact, but you get the idea...
0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-12-20 13:58
I've run in to issues modifying a DOMNodeList in a foreach loop which went away went I iterated backwards through the list. For that reason, I'd would not recommend a foreach loop as in the accepted answer. Instead use a for loop like this:
```
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
for ($els = $xpath->query('//comment()'), $i = $els->length - 1; $i >= 0; $i--) {
    $els->item($i)->parentNode->removeChild($els->item($i));
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤独总比滥情好

2020-12-20 14:00

If there's no option for this in cUrl (and I suspect there isn't, but I've been wrong before) then you can at the very least parse the resulting HTML to your heart's content with a PHP DOM parser.

This will likely be your best bet in the long run in terms of configurability and support.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2020-12-20 14:04
Regex solved this problem for me as follows:
```
function remove_html_comments($html = '') {
    return preg_replace('//', '', $html);
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...