Parsing table content in php/regex and getting result by td

╄→尐↘猪︶ㄣ 提交于 2019-12-24 10:57:03

问题


I have a table like this which I spent a full day trying to get the data from:

<table class="table table-condensed">
<tr>
<td>Monthely rent</td>
<td><strong>Fr. 1'950. </strong></td>
</tr>

<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>

<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>

</tr>

<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>

</table>

As you can see the data is well organized, and I am trying to get this result:

monthly rent => Fr. 1'950. 
Rooms(s) => 3
Surface => 93m2
Date of Contract => 01.04.17

I have the table contained inside a variable $table and tried to use DOM

$dom = new DOMDocument(); 
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr');
return $result; 

But to no avail, is there any easier way to get the contents in php/regex?


回答1:


You're on the right track with DOM and Xpath. Do not use Regular Expressions to parse HTML/XML. RegEx are for matching text and often used as a part of a parser. But a parser for a format knows about it features - a RegEx does not.

You should keep you variable names a little more clean. Do not assign different types to the same variable in the same context. It only shows that the variable name might be to generic.

DOMXpath::query() allows you to use Xpath expressions, but only expression that return a node list. DOMXpath::evaluate() allows you to fetch scalar values, too.

So you can fetch the tr elements, iterate them and use additional expression to fetch the two values using the tr element as the context.

$document = new \DOMDocument(); 
$document->loadHTML($table);
$xpath = new \DOMXPath($document);

foreach ($xpath->evaluate('//table/tr') as $tr) {
  var_dump(
     $xpath->evaluate('string(td[1])', $tr),
     $xpath->evaluate('string(td[2]/strong)', $tr)
  );
}

Output:

string(13) "Monthely rent"
string(11) "Fr. 1'950. "
string(8) "Rooms(s)"
string(1) "3"
string(7) "Surface"
string(4) "93m2"
string(16) "Date of Contract"
string(8) "01.04.17"



回答2:


Try this out:

$dom = new DOMDocument();
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr/td/strong');

foreach($result as $item) {
  echo $item->nodeValue . "\n";
}

That will print the element. However, you will probably want to setup your data in a way that you dont have to deal with the html tags like <strong>. You might want to use xml or even json.



来源:https://stackoverflow.com/questions/42449608/parsing-table-content-in-php-regex-and-getting-result-by-td

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!