问题
I'm running an EMR Activity inside a Data Pipeline analyzing log files and I get the following error when my Pipeline fails:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.208.42.127:9000/home/hadoop/temp-output-s3copy already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:905)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:905)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:879)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1316)
at com.valtira.datapipeline.stream.CloudFrontStreamLogProcessors.main(CloudFrontStreamLogProcessors.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
How can I delete that folder from Hadoop?
回答1:
When you say delete from Hadoop, you really mean delete from HDFS.
To delete something from HDFS do one of the two
From the command line:
- deprecated way:
hadoop dfs -rmr hdfs://path/to/file
- new way (with hadoop 2.4.1) :
hdfs dfs -rm -r hdfs://path/to/file
Or from java:
FileSystem fs = FileSystem.get(getConf());
fs.delete(new Path("path/to/file"), true); // delete file, true for recursive
回答2:
To delete a file from hdfs you can use below given command :
hadoop fs -rm -r -skipTrash /path_to_file/file_name
To delete a folder from hdfs you can use below given command :
hadoop fs -rm -r -skipTrash /folder_name
You need to use -skipTrash option otherwise error will be prompted.
回答3:
With Scala:
val fs:FileSystem = FileSystem.get(new URI(filePath), sc.hadoopConfiguration);
fs.delete(new Path(filePath), true) // true for recursive
sc is the SparkContext
回答4:
To delete a file from hdfs use the command:
hadoop fs -rm -r /FolderName
回答5:
I contacted AWS support and it seemed that the problem was that the log files I was analyzing were very big and that created an issue with memory. I added to my pipeline definition "masterInstanceType" : "m1.xlarge" in the EMRCluster section and it worked.
回答6:
From the command line:
hadoop fs -rm -r /folder
回答7:
I use hadoop 2.6.0, the commande line 'hadoop fs -rm -r fileName.hib' works fine for deleting any hib file on my hdfs file sys
来源:https://stackoverflow.com/questions/16797358/deleting-file-folder-from-hadoop