PHP Simple HTML DOM Parser find string

前端 未结 4 765
心在旅途
心在旅途 2020-12-31 18:52

I am using PHP simple DOM parser but it does not seem to have the functionality to search for text. I need to search for a string and find the parent id for it. Essentially

相关标签:
4条回答
  • 2020-12-31 19:17

    Got the answer. The entire example is a little long but it works. I also show the output.

    The HTML for what we are going to look at:

    <html>
    <head>
    <title>Simple HTML DOM - Find Text</title>
    </head>
    <body>
    <h3>Simple HTML DOM - Find Text</h3>
    <div id="first">
     <p>This is a paragraph inside of div 'first'.
       This paragraph does not have the text we are looking for.</p>
     <p>As a matter of fact this div does not have the text we are looking for</p>
    </div>
    <div id="second">
     <ul>
      <li>This is an unordered list.
      <li id="love1">We are looking for the following word love.
      <li>Does not contain the word.
     </ul>
     <p id="love2">This paragraph which is in div second contains the word love.</p>
    </div>
    <div id="third">
     <a id="love3" href="goes.nowhere.com">link to love site</a>
    </div>
    </body>
    </html>
    

    The PHP:

    <?php
    include_once('simple_html_dom.php');
    
    function scraping_for_text($iUrl,$iText)
    {
    echo "iUrl=".$iUrl."<br />";
    echo "iText=".$iText."<br />";
    
        // create HTML DOM
        $html = file_get_html($iUrl);
    
        // get text elements
        $aObj = $html->find('text');
        if (count($aObj) > 0)
        {
           echo "<h4>Found ".$iText."</h4>";
        }
        else
        {
           echo "<h4>No ".$iText." found"."</h4>";
        }
        foreach ($aObj as $key=>$oLove)
        {
          $plaintext = $oLove->plaintext;
          if (strpos($plaintext,$iText) !== FALSE)
          {
             echo $key.": text=".$plaintext."<br />"
                  ."--- parent tag=".$oLove->parent()->tag."<br />"
                  ."--- parent id=".$oLove->parent()->id."<br />";
          }
        }
    
        // clean up memory
        $html->clear();
        unset($html);
    
        return;
    }
    
    // -------------------------------------------------------------
    // test it!
    
    // user_agent header...
    ini_set('user_agent', 'My-Application/2.5');
    
    scraping_for_text("test_text.htm","love");
    ?>
    

    The output:

    iUrl=test_text.htm
    iText=love
    Found love
    18: text=We are looking for the following word love.
    --- parent tag=li
    --- parent id=love1
    21: text=This paragraph which is in div second contains the word love.
    --- parent tag=p
    --- parent id=love2
    25: text=link to love site
    --- parent tag=a
    --- parent id=love3
    

    That's all they wrote!!!!

    0 讨论(0)
  • 2020-12-31 19:18
    $d = new DOMDocument();
    $d->loadXML($xml);
    $x = new DOMXPath($d);
    $result = $x->evaluate("//text()[contains(.,'617.99')]/ancestor::*/@id");
    $unique = null;
    for($i = $result->length -1;$i >= 0 && $item = $result->item($i);$i--){
        if($x->query("//*[@id='".addslashes($item->value)."']")->length == 1){
            echo 'Unique ID is '.$item->value."\n";
                $unique = $item->value;
            break;
        }
    }
    if(is_null($unique)) echo 'no unique ID found';
    
    0 讨论(0)
  • 2020-12-31 19:26
    $html = file_get_html('http://www.google.com/');
    
    $eles = $html->find('*');
    foreach($eles as $e) {
        if(strpos($e->innertext, 'theString') !== false) {
            echo $e->id;
        }
    }
    

    http://simplehtmldom.sourceforge.net/manual.htm

    0 讨论(0)
  • 2020-12-31 19:30

    Just imagine that any tag has a "plaintext" attribute and use standart attribute selectors.

    So, HTML:

    <div id="div1">
      <span>London is the capital</span> of Great Britain
    </div>
    <div id="div2">
      <span>Washington is the capital</span> of the USA
    </div>
    

    can be imagined in mind as:

    <div id="div1" plaintext="London is the capital  of Great Britain">
      <span plaintext="London is the capital ">London is the capital</span> of Great Britain
    </div>
    <div id="div2" plaintext="Washington is the capital  of the USA">
      <span plaintext="Washington is the capital ">Washington is the capital</span> of the USA
    </div>
    

    And PHP to resolve your task is just:

    <?php
      $t = '
        <div id="div1">
          <span>London is the capital</span> of Great Britain
        </div>
        <div id="div2">
          <span>Washington is the capital</span> of the USA
        </div>';
      $html = str_get_html($t);
      $foo = $html->find('span[plaintext^=London]');
      echo "ID: " . $foo[0]->parent()->id; // div1
    ?>
    

    (take in mind that "plaintext" for <span> tags is right-padded with a space symbol; this is default behaviour of Simple HTML DOM, defined by constant DEFAULT_SPAN_TEXT)

    0 讨论(0)
提交回复
热议问题