问题
I have one xml file which has some html content like bold, paragraph and tables. I have written shell script to parse all html tags except tables. I'm using XML (R package) to parse the data.
<Root>
<Title> This is dummy xml file </Title>
<Content> This table summarises data in BMC format.
<div class="abctable">
<table border="1" cellspacing="0" cellpadding="0" width="100%" class="coder">
<tbody>
<tr>
<th width="50%">ABC</th>
<th width="50%">Weight status</th>
</tr>
<tr>
<td>are 18.5</td>
<td>arew</td>
</tr>
<tr>
<td>18.5 &mdash; 24.9</td>
<td>rweq</td>
</tr>
<tr>
<td>25.0 &mdash; 29.9</td>
<td>qewrte</td>
</tr>
<tr>
<td>30.0 and hwerqer</td>
<td>rwqe</td>
</tr>
<tr>
<td>40.0 rweq rweq</td>
<td>rqwe reqw</td>
</tr>
</tbody>
</table>
</div>
</Content>
<Section>blah blah blah</Section>
</Root>
How to parse the content of this table which in present in xml?
回答1:
Well there is a function called readHTMLTable in the XML package, that seems to do just what you need ?
Here is a way to do it with the following xml file :
<Root>
<Title> This is dummy xml file </Title>
<Content>
This table summarises data in BMC format.
<div class="abctable">
<table border="1" cellspacing="0" cellpadding="0" width="100%" class="coder">
<tbody>
<tr>
<th width="50%">ABC</th><th width="50%">Weight status</th>
</tr>
<tr>
<td>are 18.5</td>
<td>arew</td>
</tr>
<tr>
<td>18.5 &mdash; 24.9</td>
<td>rweq</td>
</tr>
<tr>
<td>25.0 &mdash; 29.9</td>
<td>qewrte</td>
</tr>
<tr>
<td>30.0 and hwerqer</td>
<td>rwqe</td>
</tr>
<tr>
<td>40.0 rweq rweq</td>
<td>rqwe reqw</td>
</tr>
</tbody>
</table>
</Content>
</div>
<Section>blah blah blah</Section>
</Root>
If this is saved in a file called /tmp/data.xml then you can use the following code :
doc <- htmlParse("/tmp/data.xml")
tableNodes <- getNodeSet(doc, "//table")
tb <- readHTMLTable(tableNodes[[1]])
Which fives :
R> tb
V1 V2
1 ABC Weight status
2 are 18.5 arew
3 18.5 — 24.9 rweq
4 25.0 — 29.9 qewrte
5 30.0 and hwerqer rwqe
6 40.0 rweq rweq rqwe reqw
回答2:
The best method for xml parsing would be to use xpath expressions
Xpath Tutorial
Xpath and R
How to use XPath and R stackoverflow
来源:https://stackoverflow.com/questions/14517732/how-to-get-table-data-from-html-table-in-xml