python method to extract content (excluding navigation) from an HTML page

前端 未结 5 584
无人及你
无人及你 2021-01-31 23:13

Of course an HTML page can be parsed using any number of python parsers, but I\'m surprised that there don\'t seem to be any public parsing scripts to extract meaningful content

5条回答
  •  忘掉有多难
    2021-01-31 23:50

    You might use the boilerpipe Web application to fetch and extract content on the fly.

    (This is not specific to Python, as you only need to issue a HTTP GET request to a page on Google AppEngine).

    Cheers,

    Christian

提交回复
热议问题