Pyspark: get list of files/directories on HDFS path

后端 未结 6 585
野趣味
野趣味 2020-12-05 07:14

As per title. I\'m aware of textFile but, as the name suggests, it works only on text files. I would need to access files/directories inside a path on either HD

6条回答
  •  眼角桃花
    2020-12-05 07:40

    There is an easy way to do this using snakebite library

    from snakebite.client import Client
    
    hadoop_client = Client(HADOOP_HOST, HADOOP_PORT, use_trash=False)
    
    for x in hadoop_client.ls(['/']):
    
    ...     print x
    

提交回复
热议问题