Spark read multiple directories into multiple dataframes
问题 I have a directory structure on S3 looking like this: foo |-base |-2017 |-01 |-04 |-part1.orc, part2.orc .... |-A |-2017 |-01 |-04 |-part1.orc, part2.orc .... |-B |-2017 |-01 |-04 |-part1.orc, part2.orc .... Meaning that for directory foo I have multiple output tables, base , A , B , etc in a given path based on the timestamp of a job. I'd like to left join them all, based on a timestamp and the master directory, in this case foo . This would mean reading in each output table base , A , B ,