问题
I'm trying my first serious project in jsoup
and I've got stuck in this matter-
I'm trying to get zipcodes from a site. There is a list of zipcodes.
Here is one of the lines that presents the zipcode-
<td align="center"><a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a></td>
So the idea I've got is going through the page and getting all the strings that contain 6 digits from 1-9. Regex is ^[0-9]{6,6}$
code was -
doc.select("td:matchesOwn(^[0-9]{5,5}$)");
but nothing came out. I can't find the way to get these zipcodes out of that site.... Does anyone know how to do it?
the real question here is how do i get the numbers that are not in any tags,but just written out in the open (i guess there is a term for that but im not that good with xml terms)
回答1:
I solved it using Element#getElementsMatchingOwnText
:
public static void main(String[] args) {
final String html = "<td align=\"center\"><a href=\"http://www.zipcodestogo.com/Hialeah/FL/33011/\">33011</a></td> ";
final Elements elements = Jsoup.parse(html).getElementsMatchingOwnText("^[0-9]{5,5}$");
for (final Element element : elements) {
System.out.println("element = [" + element + "]");
System.out.println("zip = [" + element.text() + "]");
}
}
Output:
element = [<a href="http://www.zipcodestogo.com/Hialeah/FL/33011/">33011</a>]
zip = [33011]
来源:https://stackoverflow.com/questions/28149254/using-a-regex-in-jsoup