问题
How can I match subject via a PHP preg_match() regular expression pattern in this HTML code:
<table border=0>
<tr>
<td>
<h2>subject</h2>
</td>
All the whitespaces and newlines are left on purpose. So the problem is in extracting subject name using some multiple line pattern.
回答1:
If you're looking for (e.g.) a h2
tag nested within a td
tag where there's only whitespace in between the two, just use \s
which includes spaces, newlines, etc. eg::
preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#i',$str,$matches);
// result is in $matches[1]
See it in action here.
For your interest, here is a list of different modifiers you can pass in to preg_*
functions. Flags that may interest you are:
s
("dotall") : this one makes.
match every character, including newlines. So, say your<h2>.....</h2>
was spread over multiple lines. Then you'd have to dopreg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#is',$str,$matches);
in order to have the
.*
go over multiple lines (see the extras
at the end of the regex?).m
("multiline") : this one just lets^
and$
match start/end of line instead of just the start/end of string. You only really need it if you're using^
and$
in your pattern and want them to match the start/end of each individual line in your input.
回答2:
You can add the m
operator to your regular expression:
// Given your HTML content.
$html = 'Your HTML content';
preg_match('/<td[^>]*>(.*?)<\/td>/im', $html, $matches);
Hope this (still) helps, hahaha.
回答3:
Very simply with
preg_match('/<h2>(.*?)<\\/h2>/', $str, $matches);
print($matches[1]);
The multi-line format has no effect on the regex unless you need to match a string that spans multiple lines.
回答4:
You shouldn't use regex to parse HTML content. It can cause a lot of issues if you cannot control what the user can input. There are a lot of better solutions in every language. An XML parser in most of the cases is doing a better job. Check out DOMDocument, simplehtmldom or php-html-parser
See here for more answers why you shouldn't use regex on HTML content: RegEx match open tags except XHTML self-contained tags
回答5:
You have to remove all line breaks using \s
in the regular expression:
$str ="<ol>
<li>Capable for unlimited product</li>
<li>Two currency support</li>
<li>Works with touch screens and click screen based systems</li>
<li>Responsive design <b>shopping cart</b>, Specially design for Mac, iPhone, iPad, PC and Android</li>
<li>VAT for countries that support a Value Added Tax</li>
<li>Barcode scanner checkout option for POS</li>
<li>mRSS</li>
</ol>";
preg_match("/^([A-Za-z0-9\s\<\>\.\,\/\-\ ]+)$/", $str);
// Sanitize your code before save to database.
function test_input($data) {
$data = trim($data);
$data = htmlspecialchars($data);
$data = json_encode($data);
$data = addslashes($data);
return $data;
}
echo test_input($str);
来源:https://stackoverflow.com/questions/8958310/matching-a-multiple-lines-pattern-via-phps-preg-match