I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.208.42.127:9000/home/hadoop/temp-output-s3copy already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:905)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:905)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:879)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1316)
at com.valtira.datapipeline.stream.CloudFrontStreamLogProcessors.main(CloudFrontStreamLogProcessors.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
How can I delete that folder from Hadoop?
When you say delete from Hadoop, you really mean delete from HDFS.
To delete something from HDFS do one of the two
From the command line:
- deprecated way:
hadoop dfs -rmr hdfs://path/to/file
- new way (with hadoop 2.4.1) :
hdfs dfs -rm -r hdfs://path/to/file
Or from java:
FileSystem fs = FileSystem.get(getConf());
fs.delete(new Path("path/to/file"), true); // delete file, true for recursive
To delete a file from hdfs you can use below given command :
hadoop fs -rm -r -skipTrash /path_to_file/file_name
To delete a folder from hdfs you can use below given command :
hadoop fs -rm -r -skipTrash /folder_name
You need to use -skipTrash option otherwise error will be prompted.
With Scala:
val fs:FileSystem = FileSystem.get(new URI(filePath), sc.hadoopConfiguration);
fs.delete(new Path(filePath), true) // true for recursive
sc is the SparkContext
To delete a file from hdfs use the command:
hadoop fs -rm -r /FolderName
I contacted AWS support and it seemed that the problem was that the log files I was analyzing were very big and that created an issue with memory. I added to my pipeline definition "masterInstanceType" : "m1.xlarge" in the EMRCluster section and it worked.
From the command line:
hadoop fs -rm -r /folder
I use hadoop 2.6.0, the commande line 'hadoop fs -rm -r fileName.hib' works fine for deleting any hib file on my hdfs file sys
来源:https://stackoverflow.com/questions/16797358/deleting-file-folder-from-hadoop