PHP: Find images and links with relative path in output and convert them to absolute path

这一生的挚爱 提交于 2019-12-20 20:01:05

问题


There are a lot of posts on converting relative to absolute paths in PHP. I'm looking for a specific implementation beyond these posts (hopefully). Could anyone please help me with this specific implementation?

I have a PHP variable containing diverse HTML, including hrefs and imgs containing relative urls. Mostly (for example) /en/discover or /img/icons/facebook.png

I want to process this PHP variable in such a way that the values of my hrefs and imgs will be converted to http://mydomain.com/en/discover and http://mydomain.com/img/icons/facebook.png

I believe the question below covers the solution for hrefs. How can we expand this to also consider imgs?

  • Change a relative URL to absolute URL

Would a regex be in order? Or since we're dealing with a lot of output should we use DOMDocument?


回答1:


After some further research I've stumbled upon this article from Gerd Riesselmann on how to solve the absence of a base href solution for RSS-feeds. His snippet actually solves my question!

http://www.gerd-riesselmann.net/archives/2005/11/rss-doesnt-know-a-base-url

<?php
function relToAbs($text, $base)
{
  if (empty($base))
    return $text;
  // base url needs trailing /
  if (substr($base, -1, 1) != "/")
    $base .= "/";
  // Replace links
  $pattern = "/<a([^>]*) " .
             "href=\"[^http|ftp|https|mailto]([^\"]*)\"/";
  $replace = "<a\${1} href=\"" . $base . "\${2}\"";
  $text = preg_replace($pattern, $replace, $text);
  // Replace images
  $pattern = "/<img([^>]*) " . 
             "src=\"[^http|ftp|https]([^\"]*)\"/";
  $replace = "<img\${1} src=\"" . $base . "\${2}\"";
  $text = preg_replace($pattern, $replace, $text);
  // Done
  return $text;
}
?>

Thank you Gerd! And thank you shadyyx to point me in the direction of base href!




回答2:


Excellent solution. However, there is a small typo in the pattern. As written above, it truncates the first character of the href or src. Here are patterns that work as intended:

// Replace links
$pattern = "/<a([^>]*) " .
         "href=\"([^http|ftp|https|mailto][^\"]*)\"/";

and

// Replace images
$pattern = "/<img([^>]*) " . 
         "src=\"([^http|ftp|https][^\"]*)\"/";

The opening parenthesis of the second replacement references are moved. This brings the first character of the href or src which doesn't match http|ftp|https into the replacement references.




回答3:


I found that when the href src and base url started getting more complex, the accepted answer solution didn't work for me.

for example:

base url:

http://www.journalofadvertisingresearch.com/ArticleCenter/default.asp?ID=86411&Type=Article

href src:

/ArticleCenter/LeftMenu.asp?Type=Article&FN=&ID=86411&Vol=&No=&Year=&Any=

incorrectly returned:

/ArticleCenter/LeftMenu.asp?Type=Article&FN=&ID=86411&Vol=&No=&Year=&Any=

I found the below function which correctly returns the url. I got this from a comment here: http://php.net/manual/en/function.realpath.php from Isaac Z. Schlueter.

This correctly returned:

http://www.journalofadvertisingresearch.com/ArticleCenter/LeftMenu.asp?Type=Article&FN=&ID=86411&Vol=&No=&Year=&Any=
function resolve_href ($base, $href) { 

// href="" ==> current url. 
if (!$href) { 
    return $base; 
} 

// href="http://..." ==> href isn't relative 
$rel_parsed = parse_url($href); 
if (array_key_exists('scheme', $rel_parsed)) { 
    return $href; 
} 

// add an extra character so that, if it ends in a /, we don't lose the last piece. 
$base_parsed = parse_url("$base "); 
// if it's just server.com and no path, then put a / there. 
if (!array_key_exists('path', $base_parsed)) { 
    $base_parsed = parse_url("$base/ "); 
} 

// href="/ ==> throw away current path. 
if ($href{0} === "/") { 
    $path = $href; 
} else { 
    $path = dirname($base_parsed['path']) . "/$href"; 
} 

// bla/./bloo ==> bla/bloo 
$path = preg_replace('~/\./~', '/', $path); 

// resolve /../ 
// loop through all the parts, popping whenever there's a .., pushing otherwise. 
    $parts = array(); 
    foreach ( 
        explode('/', preg_replace('~/+~', '/', $path)) as $part 
    ) if ($part === "..") { 
        array_pop($parts); 
    } elseif ($part!="") { 
        $parts[] = $part; 
    } 

return ( 
    (array_key_exists('scheme', $base_parsed)) ? 
        $base_parsed['scheme'] . '://' . $base_parsed['host'] : "" 
) . "/" . implode("/", $parts); 
} 


来源:https://stackoverflow.com/questions/13457693/php-find-images-and-links-with-relative-path-in-output-and-convert-them-to-abso

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!