I\'m trying to find eveything inside a div using regexp. I\'m aware that there probably is a smarter way to do this - but I\'ve chosen regexp.
so currently my regexp
You shouldn't be using regex to parse HTML when there's a convenient DOM library:
$str = '
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
';
$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName('div');
if ( count($divs ) ) {
foreach ( $divs as $div ) {
echo $div->nodeValue . '<br>';
}
}
A possible answer to this problem can be found at http://simplehtmldom.sourceforge.net/ That class help me to solve similar problem quickly
What about something like this :
$str = <<<HTML
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
HTML;
$matches = array();
preg_match_all('#<div[^>]*>(.*?)</div>#', $str, $matches);
var_dump($matches[1]);
Note the '?' in the regex, so it is "not greedy".
Which will get you :
array
0 => string 'text to extract here' (length=20)
1 => string 'text to extract from here as well' (length=33)
This should work fine... If you don't have imbricated divs ; if you do... Well... actually : are you really sure you want to use rational expressions to parse HTML, which is quite not that rational itself ?