How to extract text contents from html like Read it later or InstaPaper Iphone app?

孤街醉人 提交于 2019-12-02 14:19:36

After researching, it seems I can use api to extract text contents from web. It means I need to access webpage after I got url and render the result again.

It is slower than just using js script showed above because it needs to access web api but read it later and instapaper both are using this approach I guess.

The followings are the web api I found so far.

http://viewtext.org/

this api has very nice feature which combines multi-page articles into one. I am using this api because of this feature which other api do not have.

http://fivefilters.org/content-only/

great thing about this is you can buy script and set up on your own server.

*UPDATE*

It seems that most apps use "Readability" or "Instapaper" or "Google" mobilizer to parse only text contents from the web.

Among them, my favorite is "Readability" parser at the moment, since it doesn't come with advertisement like Instapaper parser. (Nothing wrong about putting ads to cover the server cost though)

Pocket also provides article parser only for developers who creating pocket integrated apps.

Use Newspaper3k, It's awosome.

News, full-text, and article metadata extraction in Python 3.

https://github.com/codelucas/newspaper

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!