No FileSystem for scheme: sftp

可紊 提交于 2020-01-06 19:27:06

问题


I am trying to use sftp in hadoop with distcp like below

hadoop distcp -D fs.sftp.credfile=/home/bigsql/cred.prop sftp://<<ip address>>:22/export/home/nz/samplefile hdfs:///user/bigsql/distcp

But I am getting the below error

15/11/23 13:29:06 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[sftp://<<source ip>>:22/export/home/nz/samplefile], targetPath=hdfs:/user/bigsql/distcp, targetPathExists=true, preserveRawXattrs=false}
15/11/23 13:29:09 INFO impl.TimelineClientImpl: Timeline service address: http://bigdata.ibm.com:8188/ws/v1/timeline/
15/11/23 13:29:09 INFO client.RMProxy: Connecting to ResourceManager at bigdata.ibm.com/<<target ip>>:8050
15/11/23 13:29:10 ERROR tools.DistCp: Exception encountered
java.io.IOException: No FileSystem for scheme: sftp
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:76)
        at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
        at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

Can anyone suggest what can be the cause of the problem.


回答1:


The exception is coming, because Hadoop is not able to find a file system implementation for the scheme: sftp.

The exception occurs in FileSystem.java. The framework tries to find the value for configuration parameter fs.sftp.impl and when it does not find it, it throws this exception.

As far as I know, Hadoop does not support sftp file system by default. This JIRA ticket [Add SFTP FileSystem][https://issues.apache.org/jira/browse/HADOOP-5732], indicates that, SFTP is available from Hadoop version 2.8.0.

To fix this, you need to do 2 things:

  1. Add a jar containing sftp file system implementation to your HADOOP deployment.
  2. Set the config parameter: fs.sftp.impl to a fully qualified class name of the sftp implementation.

I came across this git repository, which contains sftp implementation for Hadoop: https://github.com/wnagele/hadoop-filesystem-sftp. To use this, you need to set property fs.sftp.impl to org.apache.hadoop.fs.sftp.SFTPFileSystem.



来源:https://stackoverflow.com/questions/33872683/no-filesystem-for-scheme-sftp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!