Exclude non wanted html from Simple Html Dom - PHP

笑着哭i 提交于 2021-01-29 13:25:26

问题


I am using HTML Simple Dom Parser with PHP to get title, description and images from a website. The issue I am facing is I am getting the html which I dont want and how to exclude those html tags. Below is the explanation.

Here is a sample html structure which is being parsed.

<div id="product_description">
<p> Some text</p>
<ul>
<li>value 1</li>
<li>value 2</li>
<li>value 3</li>
</ul>

// the div I dont want
<div id="comments">
<h1> Some Text </h1>
</div>

</div>

I am using below php script to parse,

foreach($html->find('div#product_description') as $description)
{
    echo $description->outertext ;
    echo "<br>";
}

The above code parses everything inside the div with id "product_description". What I want to exclude the div with Id "comments". I tried to convert this into string and then used substr to exclude the last character but thats not working. Dont know why. Any idea about how can I do this? Any approach that will allow me to exclude the div from parsed html will work. Thanks


回答1:


You can remove the elements you don't want by setting their outertext = '':

$src =<<<src
<div id="product_description">
    <p> Some text</p>
    <ul>
        <li>value 1</li>
        <li>value 2</li>
        <li>value 3</li>
    </ul>

    <!-- the div I don't want -->                                                                                                                                        
    <div id="comments">
        <h1> Some Text </h1>
    </div>

</div>
src;

$html = str_get_html($src);

foreach($html->find('#product_description') as $description)
{
    $comments = $description->find('#comments', 0); 
    $comments->outertext = ''; 
    print $description->outertext ;
}



回答2:


Ok So i figured out myself just use Advanced Html Dom library its totally compatible with simple html dom & by using it you will get much more control. Its very simple to remove what you want from parsed html. For Ex.

//to remove script tag
$scripts = $description->find('script')->remove;

//to remove css style tag
$style = $description->find('style')->remove;

// to remove a div with class name findify-element
$findify = $description->find('div.findify-element')->remove;

enter link description here



来源:https://stackoverflow.com/questions/61014488/exclude-non-wanted-html-from-simple-html-dom-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!