Is there a way to use readability (text extraction algorithm) and a custom algorithm in python to extract links from text?

懵懂的女人 提交于 2019-12-11 23:07:13

问题


Is there a way to use readability (text extraction algorithm) and a custom algorithm in python to extract links from text?

I'd like to figure out a way of extracting links that are in the body of text.

1.) I use readability in python https://github.com/gfxmonk/python-readability

2.) I'd like to somehow compare the extracted text to the original html text in order to extract links in the actual body of an article.


回答1:


Well, it looks like it returns a BeautifulSoup tree. So you should be able to do something like:

article = page.summary()   # Extract article using readability
article.findAll("a")       # Return a list of all links in the article


来源:https://stackoverflow.com/questions/4589323/is-there-a-way-to-use-readability-text-extraction-algorithm-and-a-custom-algor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!