Site-Mining tools

自闭症网瘾萝莉.ら 提交于 2020-01-04 11:02:47

问题


Many of the questions asked here are relevant to research I'm doing. These questions and answers are widely dispersed and not always easy to find, doing manual browsing, and sometimes an insightful answer or comment occurs in unrelated topics as well.

I want to automate finding these relevant Q's & A's, based on sets of keywords, then use the information as pointers towards further in-depth research.

What tools, preferably open-source, are available that I can use for this type of site-mining? I am not a web guru & for me to try to develop them will take a long time and also impact on time I could have spent on my R&D.


回答1:


It is not clear from your question whether you are a programmer or not, so I'm not sure whether you are after tools in the sense of apps or services that to what you want, or a library that makes site-mining easier.

If the latter is the case and you use ruby, I can thoroughly recommend WWW::Mechanize. It provides a nice API for writing scripts to search web pages (by DOM or by text), follow links, and fill out forms. I've used it several times to organise information that's spread over several web pages within a site.

I believe the ruby version was based on an earlier library for perl but I can't vouch for the perl version it I've not used it.




回答2:


Another option would be using Yahoo! Pipes. (demo)

You can build such system visually online using a combination of feed urls, filters, etc... Learning time is minimal compared to programming. [edited: tense]




回答3:


Human interaction tools might be useful in such case (no development cost, probably a more consistent outcome, and evolving requirements).

Couple comes to mind:

  • Mechanical Turk.
  • Time Svr (more expensive) - experiment/review.



回答4:


All of the tags based on keywords have RSS feeds attached to them, so I'd start by subscribing to relevant keywords and searching the data. It seems like the simplest way to find related concepts and other related keywords.



来源:https://stackoverflow.com/questions/165840/site-mining-tools

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!