Nutch 2.2.1 & HBase - Can I create a new property in nutch-site.xml

我怕爱的太早我们不能终老 提交于 2019-12-25 10:36:04

问题


I wanna develop a topical web robot using Nutch 2.2.1. And I wanna create a new property with some topic keywords,like following:

<property>
    <name>html.metatitle.keys</name>
    <value>movie,actor,firm</value>
    <description>
    </description>
</property>

回答1:


There are two different solutions available for your problem:

  1. Implementing a customized HtmlParseFilter plugin to filter pages based on your desired keywords. For more information about Nutch extension points and writing customized plugin for Nutch take a look at these manuals:

    http://wiki.apache.org/nutch/AboutPlugins

    http://wiki.apache.org/nutch/WritingPluginExample

  2. Using an indexer to filter documents based on desired keywords; However, this solution is available if you have indexer in your system design architecture. In this case Apache Solr could help you for filtering documents before indexing. Here you have to implement a customized UpdateRequestProcessor. For more information about Solr and its extension points take a look at these pages:

    https://wiki.apache.org/solr/FrontPage

    https://wiki.apache.org/solr/UpdateRequestProcessor



来源:https://stackoverflow.com/questions/29786729/nutch-2-2-1-hbase-can-i-create-a-new-property-in-nutch-site-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!