发表新帖

发表新帖

SOLR - Best approach to import 20 million documents from csv file

后端未结

关注

 5  974

有刺的猬 2020-12-29 09:29

My current task on hand is to figure out the best approach to load millions of documents in solr. The data file is an export from DB in csv format.

Currently, I am t

5条回答

情歌与酒 (楼主)

2020-12-29 10:03
Above answers have explained really well the ingestion strategies from single machine.

Few more options if you have big data infrastructure in place and want to implement distributed data ingestion pipeline.
1. Use sqoop to bring data to hadoop or place your csv file manually in hadoop.
2. Use one of below connector to ingest data:
hive- solr connector, spark- solr connector.

PS:
- Make sure no firewall blocks connectivity between client nodes and solr/solrcloud nodes.
- Choose right directory factory for data ingestion, if near real time search is not required then use StandardDirectoryFactory.
- If you get below exception in client logs during ingestion then tune autoCommit and autoSoftCommit configuration in solrconfig.xml file.
SolrServerException: No live SolrServers available to handle this request
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题