Extract links from a web page

前端 未结 6 1017
遇见更好的自我
遇见更好的自我 2020-12-01 08:22

Using Java, how can I extract all the links from a given web page?

6条回答
  •  天命终不由人
    2020-12-01 08:53

    You can use the HTML Parser library to achieve this:

    public static List getLinksOnPage(final String url) {
        final Parser htmlParser = new Parser(url);
        final List result = new LinkedList();
    
        try {
            final NodeList tagNodeList = htmlParser.extractAllNodesThatMatch(new NodeClassFilter(LinkTag.class));
            for (int j = 0; j < tagNodeList.size(); j++) {
                final LinkTag loopLink = (LinkTag) tagNodeList.elementAt(j);
                final String loopLinkStr = loopLink.getLink();
                result.add(loopLinkStr);
            }
        } catch (ParserException e) {
            e.printStackTrace(); // TODO handle error
        }
    
        return result;
    }
    

提交回复
热议问题