Get a list of subdirectories
问题 I know I can do this: data = sc.textFile('/hadoop_foo/a') data.count() 240 data = sc.textFile('/hadoop_foo/*') data.count() 168129 However, I would like to count the size of the data of every subdirectory of "/hadoop_foo/". Can I do that? In other words, what I want is something like this: subdirectories = magicFunction() for subdir in subdirectories: data sc.textFile(subdir) data.count() I tried with: In [9]: [x[0] for x in os.walk("/hadoop_foo/")] Out[9]: [] but I think that fails, because