问题
we are continuously getting this exception during replication from master to slave. our index size is 9.7 G and we are trying to replicate a slave from scratch.
30 Oct 2013 18:22:16,996 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _41c_Lucene41_0.doc completely. Downloaded 0!=107464871 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1266) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1146) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:741) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:405) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:319) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:220)
I read in some thread that there was a related bug in solr 4.1, but we are using solr 4.3 and tried with 4.5.1 also. It seams that DirectoryFileFetcher can not download a file sometimes , the files is downloaded to the salve in size zero.
this is the master setup:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml</str>
<str name="commitReserveDuration">00:00:50</str>
</lst>
</requestHandler>
and the slave setup:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">stopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml</str>
<str name="commitReserveDuration">00:00:50</str>
</lst>
</requestHandler>
回答1:
The problem appeared to be with httpclient. I turned on debug logging for all libraries and saw a message "Garbage in response" coming from httpclient just before the failure.
this is a log snippet:
31 Oct 2013 18:10:40,360 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Sending request: GET /solr-master/replication?comman
d=filecontent&generation=6814&qt=%2Freplication&file=_aa7_Lucene41_0.pos&checksum=true&wt=filestream HTTP/1.1 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "GET /solr-master/replication?command=filecontent&generation=6814&qt =%2Freplication&file=_aa7_Lucene41_0.pos&checksum=true&wt=filestream HTTP/1.1[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "Host: solr-master.saltdev.sealdoc.com:8081[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "Connection: Keep-Alive[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - >> "[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> GET /solr-master/replication?command=filecontent&generation=6814& qt=%2Freplication&file=_aa7_Lucene41_0.pos&checksum=true&wt=filestream HTTP/1.1 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer ] 1.0 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> Host: solr-master.saltdev.sealdoc.com:8081 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG headers - >> Connection: Keep-Alive 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "4[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: 4 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "[0x0][0x0][0x0][0x0][\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: ^@^@^@^@ 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "0[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: 0 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG wire - << "[\r][\n]" 31 Oct 2013 18:10:40,361 [explicit-fetchindex-cmd] DEBUG DefaultHttpResponseParser - Garbage in response: 31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Connection 0.0.0.0:55266<->172.16.77.121:8081 closed 31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Connection 0.0.0.0:55266<->172.16.77.121:8081 shut down 31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG DefaultClientConnection - Connection 0.0.0.0:55266<->172.16.77.121:8081 closed 31 Oct 2013 18:10:40,398 [explicit-fetchindex-cmd] DEBUG PoolingClientConnectionManager - Connection released: [id: 0][route: {}->http://solr-master.saltdev.sealdoc.com:8081][total kept alive: 1; route allocated: 1 of 10000; total allocated: 1 of 10000] 31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index 2 false 31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDir<> 31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data 0 false 31 Oct 2013 18:10:40,425 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDir<> 31 Oct 2013 18:10:40,427 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data 0 false 31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Done with dir: CachedDir<> 31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index.20131031180837277 0 true 31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] INFO CachingDirectoryFactory - looking to close /opt/watchdox/solr-slave/data/index.20131031180837277 [CachedDir<>] 31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] INFO CachingDirectoryFactory - Closing directory: /opt/watchdox/solr-slave/data/index.20131031180837277 31 Oct 2013 18:10:40,428 [explicit-fetchindex-cmd] INFO CachingDirectoryFactory - Removing directory before core close: /opt/watchdox/solr-slave/data/index.20131031180837277 31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Removing from cache: CachedDir<> 31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG CachingDirectoryFactory - Releasing directory: /opt/watchdox/solr-slave/data/index 1 false 31 Oct 2013 18:10:40,879 [explicit-fetchindex-cmd] ERROR ReplicationHandler - SnapPull failed :org.apache.solr.common.SolrException: Unable to download _aa7_Lucene41_0.pos completely. Downloaded 0!=1081710 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1212) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1092) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)
31 Oct 2013 18:10:40,910 [http-bio-8080-exec-8] DEBUG CachingDirectoryFactory - Reusing cached directory: CachedDir<>
So I upgraded the httpcomponents jars to their latest 4.3.x version and the problem disappeared. the httpcomponents jars which are dependencies of solrj where in the 4.2.x version, I upgraded to httpclient-4.3.1 , httpcore-4.3 and httpmime-4.3.1 I ran the replication a few times now and no problem at all, it is now working as expected. It seams that the upgrade is necessary only on the slave side but I'm going to upgrade the master too.
来源:https://stackoverflow.com/questions/19691156/solr-replicationhandler-snappull-failed-to-download-files