Using Java, how can I extract all the links from a given web page?
You can use the HTML Parser library to achieve this:
public static List getLinksOnPage(final String url) {
final Parser htmlParser = new Parser(url);
final List result = new LinkedList();
try {
final NodeList tagNodeList = htmlParser.extractAllNodesThatMatch(new NodeClassFilter(LinkTag.class));
for (int j = 0; j < tagNodeList.size(); j++) {
final LinkTag loopLink = (LinkTag) tagNodeList.elementAt(j);
final String loopLinkStr = loopLink.getLink();
result.add(loopLinkStr);
}
} catch (ParserException e) {
e.printStackTrace(); // TODO handle error
}
return result;
}