问题
I wanna develop a topical web robot using Nutch 2.2.1. And I wanna create a new property with some topic keywords,like following:
<property>
<name>html.metatitle.keys</name>
<value>movie,actor,firm</value>
<description>
</description>
</property>
回答1:
There are two different solutions available for your problem:
Implementing a customized
HtmlParseFilter
plugin to filter pages based on your desired keywords. For more information about Nutch extension points and writing customized plugin for Nutch take a look at these manuals:http://wiki.apache.org/nutch/AboutPlugins
http://wiki.apache.org/nutch/WritingPluginExample
Using an indexer to filter documents based on desired keywords; However, this solution is available if you have indexer in your system design architecture. In this case Apache Solr could help you for filtering documents before indexing. Here you have to implement a customized
UpdateRequestProcessor
. For more information about Solr and its extension points take a look at these pages:https://wiki.apache.org/solr/FrontPage
https://wiki.apache.org/solr/UpdateRequestProcessor
来源:https://stackoverflow.com/questions/29786729/nutch-2-2-1-hbase-can-i-create-a-new-property-in-nutch-site-xml