Using regular expressions to extract the first image source from html codes?

后端 未结 10 1113
深忆病人
深忆病人 2020-12-05 01:07

I would like to know how this can be achieved.

Assume: That there\'s a lot of html code containing tables, divs, images, etc.

Problem: How can I get matches

10条回答
  •  长情又很酷
    2020-12-05 01:27

    I don't know if you MUST use regex to get your results. If not, you could try out simpleXML and XPath, which would be much more reliable for your goal:

    First, import the HTML into a DOM Document Object. If you get errors, turn errors off for this part and be sure to turn them back on afterward:

     $dom = new DOMDocument();
     $dom -> loadHTMLFile("filename.html");
    

    Next, import the DOM into a simpleXML object, like so:

     $xml = simplexml_import_dom($dom);
    

    Now you can use a few methods to get all of your image elements (and their attributes) into an array. XPath is the one I prefer, because I've had better luck with traversing the DOM with it:

     $images = $xml -> xpath('//img/@src');
    

    This variable now can treated like an array of your image URLs:

     foreach($images as $image) {
        echo '
    '; }

    Presto, all of your images, none of the fat.

    Here's the non-annotated version of the above:


     $dom = new DOMDocument();
     $dom -> loadHTMLFile("filename.html");
    
     $xml = simplexml_import_dom($dom);
    
     $images = $xml -> xpath('//img/@src');
    
     foreach($images as $image) {
        echo '
    '; }

提交回复
热议问题