问题
I have uploaded my data genotype1_large_ind_large.txt phenotype1_large_ind_large_1.txt
to the S3 system, and in the EMR UI, I set the parameter like below
RunDear.run s3n://scalability/genotype1_large_ind_large.txt s3n://scalability/phenotype1_large_ind_large_1.txt s3n://scalability/output_1phe 33 10 4
In my class RunDear.run I will distribute the file genotype1_large_ind_large.txt and phenotype1_large_ind_large_1.txt to the cache
However, after running the EMR, I get the following error: java.io.FileNotFoundException: File does not exist: /genotype1_large_ind_large.txt
I am wondering why there is slash '/' in front of the file name? how to make it work?
I also tried to use like below, but my program will take -cacheFile as an argument, thus also does not work,
RunDear.run -cacheFile s3n://scalability/genotype1_large_ind_large.txt#genotype.txt -cacheFile s3n://scalability/phenotype1_large_ind_large_1.txt#phenotype.txt s3n://scalability/output_1phe 33 280 4
回答1:
I finally realize it is the problem of using the filesystem, so I add a code in the program like below FileSystem fs = FileSystem.get( URI.create("s3://scalability"), conf);
来源:https://stackoverflow.com/questions/10315378/getting-file-does-not-exist-error-when-running-an-amazon-emr-job