I have HTML code similar to :
<tr><td >1 </td>
<td class="tab-links">Value 1</td>
</tr>
<tr><td >2 </td>
<td class="tab-links">Value 2</td>
</tr>
<tr><td >3 </td>
<td class="tab-links">Value 3</td>
</tr>
<tr><td >4 </td>
<td class="tab-links">Value 4</td>
</tr>
now I want to extract the data as follow please :
1 : Value 1
2 : Value 2
3 : Value 3
4 : Value 4
any ideas please ?
As described in this post, you should not be using regex to parse HTML.
Use an XML/HTML parser instead.
Assuming the html is well formed, you can parse the html using HtmlUnit.
You could also write you own regular expression to process the page if there is just a single table but I would highly recommend against this as regular expressions might give strange results if the page added additional tables whereas with HtmlUnit you could validate that the page has only a single table before you start to parse or just target the table you wish.
http://htmlcleaner.sourceforge.net/
http://jericho.htmlparser.net/docs/index.html
are the well-known html parser for java. You can use any of them.
来源:https://stackoverflow.com/questions/6496134/html-data-extract-in-java