As per title. I\'m aware of textFile but, as the name suggests, it works only on text files.
I would need to access files/directories inside a path on either HD
This might work for you:
import subprocess, re
def listdir(path):
files = str(subprocess.check_output('hdfs dfs -ls ' + path, shell=True))
return [re.search(' (/.+)', i).group(1) for i in str(files).split("\\n") if re.search(' (/.+)', i)]
listdir('/user/')
This also worked:
hadoop = sc._jvm.org.apache.hadoop
fs = hadoop.fs.FileSystem
conf = hadoop.conf.Configuration()
path = hadoop.fs.Path('/user/')
[str(f.getPath()) for f in fs.get(conf).listStatus(path)]