问题
I am facing issue while executing distcp command between two different hadoop clusters,
Caused by: java.io.IOException: Mismatch in length of source:hdfs://ip1/xxxxxxxxxx/xxxxx and target:hdfs://nameservice1/xxxxxx/.distcp.tmp.attempt_1483200922993_0056_m_000011_2
I tried using -pb and -skipcrccheck:
hadoop distcp -pb -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/
hadoop distcp -pb hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/
hadoop distcp -skipcrccheck -update hdfs://ip1/xxxxxxxxxx/xxxxx hdfs:///xxxxxxxxxxxx/
but nothing seems to be working.
Any solutions please.
回答1:
I was facing the same issue with distcp between two Hadoop clusters of exactly the same version. For me it turned out to be due to some files in one of the source directories being still open. Once I ran distcp for each source directory individually I was able to find that was the case - it worked fine for all but the one directory with the open files and only for those files. Of course it's hard to tell at first blush.
回答2:
The issue was resolved by performing copyToLocal from cluster1 one to local linux fs and copyFromLocal to cluster2.
回答3:
Check source file stats, use command:
hdfs fsck hdfs://xxxxxxxxxxx
If the source file is not close, use this command to close it:
hdfs debug recoverLease -path hdfs://xxxxxxx
hadoop distcp -bandwidth 15 -m 50 -pb hdfs://xxxxxx hdfs://xxxxxx
来源:https://stackoverflow.com/questions/41542844/distcp-mismatch-in-length-of-source