How to display images when using cURL?

前端 未结 3 1725
旧巷少年郎
旧巷少年郎 2020-12-20 07:06

When scraping page, I would like the images included with the text.

Currently I\'m only able to scrape the text. For example, as a test script, I scraped Google\'s h

3条回答
  •  孤城傲影
    2020-12-20 07:29

    If the site you're loading is using relative paths for its resource URLs (i.e. /images/whatever.gif instead of http://www.site.com/images/whatever.gif), you're going to need to do some rewriting of those URLs in the source you get back, since cURL won't do that itself, though Wget (official site seems to be down) does (and will even download and mirror the resources for you), but does not provide PHP bindings.

    So, you need to come up with a methodology to scrape through the resulting source and change relative paths into absolute paths. A naive way would be something like this:

    if (!preg_match('/src="https?:\/\/"/', $result))
        $result = preg_replace('/src="(.*)"/', "src=\"$MY_BASE_URL\\1\"", $result);
    

    where $MY_BASE_URL is the base URL you want to rewrite, i.e. http://www.mydomain.com. That won't work for everything, but it should get you started. It's not an easy thing to do, and you might be better off just spawning off a wget command in the background and letting it mirror or rewrite the HTML for you.

提交回复
热议问题