how to parse html with nutch and index specific tag to solr?

后端 未结 4 1741
别那么骄傲
别那么骄傲 2021-01-13 07:25

i have installed nutch and solr for crawling a website and search in it; as you know we can index meta tags of webpages into solr with parse meta tags plugin of nutch.(http:

4条回答
  •  长发绾君心
    2021-01-13 08:01

    You can use one of these custom plugins to parse xml files based on xpath (or css selectors):

    • https://github.com/BayanGroup/nutch-custom-search
    • http://www.atlantbh.com/precise-data-extraction-with-apache-nutch/

提交回复
热议问题