Distcp Mismatch in length of source

假如想象 提交于 2019-12-11 02:24:35

问题


I am facing issue while executing distcp command between two different hadoop clusters,

Caused by: java.io.IOException: Mismatch in length of source:hdfs://ip1/xxxxxxxxxx/xxxxx and target:hdfs://nameservice1/xxxxxx/.distcp.tmp.attempt_1483200922993_0056_m_000011_2

I tried using -pb and -skipcrccheck:

hadoop distcp -pb -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

hadoop distcp -pb  hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

hadoop distcp -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/ 

but nothing seems to be working.

Any solutions please.


回答1:


I was facing the same issue with distcp between two Hadoop clusters of exactly the same version. For me it turned out to be due to some files in one of the source directories being still open. Once I ran distcp for each source directory individually I was able to find that was the case - it worked fine for all but the one directory with the open files and only for those files. Of course it's hard to tell at first blush.




回答2:


The issue was resolved by performing copyToLocal from cluster1 one to local linux fs and copyFromLocal to cluster2.




回答3:


  1. Check source file stats, use command:

    hdfs fsck hdfs://xxxxxxxxxxx
    
  2. If the source file is not close, use this command to close it:

    hdfs debug recoverLease -path hdfs://xxxxxxx
    
  3. hadoop distcp -bandwidth 15 -m 50 -pb hdfs://xxxxxx hdfs://xxxxxx



来源:https://stackoverflow.com/questions/41542844/distcp-mismatch-in-length-of-source

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!