SOLR - Best approach to import 20 million documents from csv file

后端 未结 5 974
有刺的猬
有刺的猬 2020-12-29 09:29

My current task on hand is to figure out the best approach to load millions of documents in solr. The data file is an export from DB in csv format.

Currently, I am t

5条回答
  •  情歌与酒
    2020-12-29 10:03

    Above answers have explained really well the ingestion strategies from single machine.

    Few more options if you have big data infrastructure in place and want to implement distributed data ingestion pipeline.

    1. Use sqoop to bring data to hadoop or place your csv file manually in hadoop.
    2. Use one of below connector to ingest data:

    hive- solr connector, spark- solr connector.

    PS:

    • Make sure no firewall blocks connectivity between client nodes and solr/solrcloud nodes.
    • Choose right directory factory for data ingestion, if near real time search is not required then use StandardDirectoryFactory.
    • If you get below exception in client logs during ingestion then tune autoCommit and autoSoftCommit configuration in solrconfig.xml file.

    SolrServerException: No live SolrServers available to handle this request

提交回复
热议问题