How do I retrieve all src value using regex in php?
jQuery Method
var Scripts = [];
$('head script').each(function(){
if($(this).attr('type') == 'text/javascript' && $(this).attr('src')){
Scripts.push($(this).attr('src'));
}
});
console.log(Scripts)
If you decide to go the regex route, this should be useful for you
/(?<=\<).*?src=(['"])(.*?)\1.*?(?=/?\>)/si
I agree with Nick, use the DomDocument object to fetch your data. Here is a xpath version:
$doc =
<<<DOC
<script type="text/javascript" src="http://localhost/assets/javascript/system.js" charset="UTF-8"></script>
<script type='text/javascript' src='http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c' charset='UTF-8'></script>
DOC;
$doc = new DomDocument;
$doc->loadHTML($doc);
$xpath = new DomXpath($doc);
$elements = $xpath->query('//[@src]');
foreach($elements as $element)
{
echo $element->nodeValue;
}
/src=(["'])(.*?)\1/
example:
<?php
$input_string = '<script type="text/javascript" src="http://localhost/assets/javascript/system.js" charset="UTF-8"></script>';
$count = preg_match('/src=(["\'])(.*?)\1/', $input_string, $match);
if ($count === FALSE)
echo('not found\n');
else
echo($match[2] . "\n");
$input_string = "<script type='text/javascript' src='http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c' charset='UTF-8'></script>";
$count = preg_match('/src=(["\'])(.*?)\1/', $input_string, $match);
if ($count === FALSE)
echo('not found\n');
else
echo($match[2] . "\n");
gives:
http://localhost/assets/javascript/system.js
http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c
Maybe it is just me, but I don't like using regular expressions for finding things in pieces of HTML, especially when the HTML is unpredictable (perhaps comes from a user or other web pages).
How about something like this:
$doc =
<<<DOC
<script type="text/javascript" src="http://localhost/assets/javascript/system.js" charset="UTF-8"></script>
<script type='text/javascript' src='http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c' charset='UTF-8'></script>
DOC;
$dom = new DomDocument;
$dom->loadHTML( $doc );
$elems = $dom->getElementsByTagName('*');
foreach ( $elems as $elm ) {
if ( $elm->hasAttribute('src') )
$srcs[] = $elm->getAttribute('src');
}
print_r( $srcs );
I don't know what the speed difference is between this and a regular expression but it takes me a heck of a lot less time to read it and understand what I'm trying to do.