Matching a multiple lines pattern via PHP's preg_match()

纵饮孤独 提交于 2019-12-28 16:30:11

问题


How can I match subject via a PHP preg_match() regular expression pattern in this HTML code:

      <table border=0>
  <tr>
  <td>


  <h2>subject</h2>



    </td>

All the whitespaces and newlines are left on purpose. So the problem is in extracting subject name using some multiple line pattern.


回答1:


If you're looking for (e.g.) a h2 tag nested within a td tag where there's only whitespace in between the two, just use \s which includes spaces, newlines, etc. eg::

preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#i',$str,$matches);
// result is in $matches[1]

See it in action here.

For your interest, here is a list of different modifiers you can pass in to preg_* functions. Flags that may interest you are:

  • s ("dotall") : this one makes . match every character, including newlines. So, say your <h2>.....</h2> was spread over multiple lines. Then you'd have to do

    preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#is',$str,$matches);
    

    in order to have the .* go over multiple lines (see the extra s at the end of the regex?).

  • m ("multiline") : this one just lets ^ and $ match start/end of line instead of just the start/end of string. You only really need it if you're using ^ and $ in your pattern and want them to match the start/end of each individual line in your input.



回答2:


You can add the m operator to your regular expression:

// Given your HTML content.
$html = 'Your HTML content';
preg_match('/<td[^>]*>(.*?)<\/td>/im', $html, $matches);

Hope this (still) helps, hahaha.




回答3:


Very simply with

preg_match('/<h2>(.*?)<\\/h2>/', $str, $matches);
print($matches[1]);

The multi-line format has no effect on the regex unless you need to match a string that spans multiple lines.




回答4:


You shouldn't use regex to parse HTML content. It can cause a lot of issues if you cannot control what the user can input. There are a lot of better solutions in every language. An XML parser in most of the cases is doing a better job. Check out DOMDocument, simplehtmldom or php-html-parser

See here for more answers why you shouldn't use regex on HTML content: RegEx match open tags except XHTML self-contained tags




回答5:


You have to remove all line breaks using \s in the regular expression:

$str ="<ol>
         <li>Capable for unlimited product</li>
         <li>Two currency support</li>
         <li>Works with touch screens and click screen based systems</li>
         <li>Responsive design <b>shopping cart</b>, Specially design for Mac, iPhone, iPad, PC and Android</li>
         <li>VAT for countries that support a Value Added Tax</li>
         <li>Barcode scanner checkout option for POS</li>
         <li>mRSS</li>
       </ol>";

preg_match("/^([A-Za-z0-9\s\<\>\.\,\/\-\ ]+)$/", $str);

// Sanitize your code before save to database.

function test_input($data) {
    $data = trim($data);
    $data = htmlspecialchars($data);
    $data = json_encode($data);
    $data = addslashes($data);
    return $data;
}

echo test_input($str);


来源:https://stackoverflow.com/questions/8958310/matching-a-multiple-lines-pattern-via-phps-preg-match

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!