How to index wikipedia files in .xml format into solr

心不动则不痛 提交于 2019-12-11 18:32:07

问题


I want to index xml files of Wikipedia into Solr.

But I am getting an error, it is unable to index. Solr has a specific format for xml files. I changed the schema.xml and data-config.xml files to suit the tags of the wikipedia files.

Still it is unable to index the files. My actual intention is to index wikipedia which is an xml file of 30 GB.

How would I go about indexing all wikipedia files into Solr?


回答1:


There's an example section in the DataImportHandler documentation for exactly this: indexing Wikipedia.

Basically, you use the DataImportHandler and some XPath to pull the metadata you care about out of the Wikipedia XML, and put it in flat Solr field listings.



来源:https://stackoverflow.com/questions/10000378/how-to-index-wikipedia-files-in-xml-format-into-solr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!