问题
I'm trying to user Solr with Nutch on a Windows Machine and I'm getting the following error:
Exception in thread "main" java.io.IOException: Failed to set permissions of path: c:\temp\mapred\staging\admin-1654213299\.staging to 0700
From a lot of threads I learned, that hadoop wich seems to be used by nutch does some chmod magic that will work on unix machines, but not on windows.
This problem exists for more than a year now. I found one thread, where the code line is shown and a fix proposed. Am I really them only one who has this problem? Are all others creating a custom build in order to run nutch on windows? Or is there some option to disable the hadoop stuff or another solution? Maybe another crawler than nutch?
Thanks a lot. Boris
Here's the stack trace of what I'm doing....
admin@WIN-G1BPD00JH42 /cygdrive/c/solr/apache-nutch-1.6
$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 -solr http://localhost:8080/solr-4.1.0
cygpath: can't convert empty path
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=http://localhost:8080/solr-4.1.0
topN = 5
Injector: starting at 2013-03-03 17:43:15
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Failed to set permissions of path: c:\temp\mapred\staging\admin-1654213299\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at org.apache.nutch.crawl.Injector.inject(Injector.java:281)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
回答1:
It took me a while to get this working but here's the solution which works on nutch 1.7.
- Download Hadoop Core 0.20.2 from the maven repository
- Replace
$NUTCH_HOME/lib/hadoop-core-1.2.0.jar
with the downloaded file renaming it with the same name.
That should be it.
Explanation
This issue is caused by hadoop since it assumes you're running on unix and abides by the file permission rules. The issue was resolved in 2011 actually but nutch didn't update the hadoop version they use. The relevant fixes are here and here
回答2:
We are using Nutch too, but it is not supported for running on Windows, on Cygwin our 1.4 version had similar problems as you had, something like mapreduce too.
We solved it by using a vm (Virtual box) with Ubuntu and a shared directory between Windows and Linux, so we can develop and built on Windows and run Nutch (crawling) on Linux.
回答3:
I have Nutch running on windows, no custom build. It's a long time since I haven't used it though. But one thing that took me a while to catch, is that you need to run cygwin as a windows admin to get the necessary rights.
回答4:
I suggest a different approach. Check this link out. It explains how to swallow the error on Windows, and does not require you to downgrade Hadoop or rebuild Nutch. I tested on Nutch 2.1, but it applies to other versions as well. I also made a simple .bat for starting the crawler and indexer, but it is meant for Nutch 2.x, might not be applicable for Nutch 1.x.
For the sake of posterity, the approach entails:
Making a custom
LocalFileSystem
implementation:public class WinLocalFileSystem extends LocalFileSystem { public WinLocalFileSystem() { super(); System.err.println("Patch for HADOOP-7682: "+ "Instantiating workaround file system"); } /** * Delegates to <code>super.mkdirs(Path)</code> and separately calls * <code>this.setPermssion(Path,FsPermission)</code> */ @Override public boolean mkdirs(Path path, FsPermission permission) throws IOException { boolean result=super.mkdirs(path); this.setPermission(path,permission); return result; } /** * Ignores IOException when attempting to set the permission */ @Override public void setPermission(Path path, FsPermission permission) throws IOException { try { super.setPermission(path,permission); } catch (IOException e) { System.err.println("Patch for HADOOP-7682: "+ "Ignoring IOException setting persmission for path \""+path+ "\": "+e.getMessage()); } } }
Compiling it and placing the JAR under
${HADOOP_HOME}/lib
And then registering it by modifying
${HADOOP_HOME}/conf/core-site.xml
:fs.file.impl com.conga.services.hadoop.patch.HADOOP_7682.WinLocalFileSystem Enables patch for issue HADOOP-7682 on Windows
回答5:
You have to change the project dependences hadoop-core and hadoop-tools. I'm using 0.20.2 version and works fine.
来源:https://stackoverflow.com/questions/15188050/nutch-in-windows-failed-to-set-permissions-of-path