How can I extract only the main textual content from an HTML page?

前端未结

关注

 9  1574

旧巷少年郎 2021-01-31 04:48

Update

Boilerpipe appears to work really well, but I realized that I don\'t need only the main content because many pages don\'t have an article, but only links with s

9条回答

自闭症患者 (楼主)

2021-01-31 05:30

You're looking for what are known as "HTML scrapers" or "screen scrapers". Here are a couple of links to some options for you:

Tag Soup

HTML Unit

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...