strip_tags, remove javascript

白昼怎懂夜的黑 提交于 2020-01-05 12:13:40

问题


I have this problem with the code i'm running right now.

My code is that I enter an URL and when i click submit it removes all tags. I use strip_tags for that one. And then I use preg_match_all("/((?:\w'|\w|-)+)/", $contents, $words); which creates and array of every word. I then have a foreach loop which will count all words and then place it in a table with another foreach loop.

The problem is for example. Say I enter an URL which has the following content:

<html>
    <head>
        <title>titel1</title>
    </head>
    <body>
        <div id="div1">
            <h1 class="class2">
                Testpage-h1
            </h1>
            <p>
                Testpage-p
            </p>
        </div>
        <script>
            alert('hallo');
            document.getElementById('class2');
        </script>
    </body>
</html>

This will echo out the following using my code:

document         1
getElementById1  1
class2'          1
hallo            1
alert            1
Testpage-h1      1
Testpage-p       1
titel1           1

(sorry for placing this as 'code' but it wouldn't let me use breaks otherwise, or place the numbers under eachother)

My problem with this is that it shouldn't show what is between the <script></script> tags, because that has no use for me anyway. Is there a solution for this matter?

I've tried such things as sanitize filterering but this didn't help me.


回答1:


You can remove < script >...< /script > from your string before any calculations:

$text = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text);

Or another solutions (slower, but sometimes more correct) from remove script tag from HTML content:

$doc = new DOMDocument();

// load the HTML string we want to strip
$doc->loadHTML($html);

// get all the script tags
$script_tags = $doc->getElementsByTagName('script');

$length = $script_tags->length;

// for each tag, remove it from the DOM
for ($i = 0; $i < $length; $i++) {
  $script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
}

// get the HTML string back
$no_script_html_string = $doc->saveHTML();


来源:https://stackoverflow.com/questions/22781853/strip-tags-remove-javascript

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!