I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How to copy data from one HDFS to another HDFS? Is it possible via Sqoop or other c
distcp is used for copying data to and from the hadoop filesystems in parallel. It is similar to the generic hadoop fs -cp command. In the background process, distcp is implemented as a MapReduce job where mappers are only implemented for copying in parallel across the cluster.
Usage:
copy one file to another
% hadoop distcp file1 file2
copy directories from one location to another
% hadoop distcp dir1 dir2
If dir2 doesn't exist then it will create that folder and copy the contents. If dir2 already exists, then dir1 will be copied under it. -overwrite option forces the files to be overwritten within the same folder. -update option updates only the files that are changed.
transferring data between two HDFS clusters
% hadoop distcp -update -delete hdfs://nn1/dir1 hdfs://nn2/dir2
-delete option deletes the files or directories from the destination that are not present in the source.