How do I read a Parquet in R and convert it to an R DataFrame?

后端 未结 9 1368
北荒
北荒 2020-12-28 13:04

I\'d like to process Apache Parquet files (in my case, generated in Spark) in the R programming language.

Is an R reader available? Or is work being done on one?

9条回答
  •  我在风中等你
    2020-12-28 13:21

    If you're using Spark then this is now relatively simple with the release of Spark 1.4 see sample code below that uses the SparkR package that is now part of the Apache Spark core framework.

    # install the SparkR package
    devtools::install_github('apache/spark', ref='master', subdir='R/pkg')
    
    # load the SparkR package
    library('SparkR')
    
    # initialize sparkContext which starts a new Spark session
    sc <- sparkR.init(master="local")
    
    # initialize sqlContext
    sq <- sparkRSQL.init(sc)
    
    # load parquet file into a Spark data frame and coerce into R data frame
    df <- collect(parquetFile(sq, "/path/to/filename"))
    
    # terminate Spark session
    sparkR.stop()
    

    An expanded example is shown @ https://gist.github.com/andyjudson/6aeff07bbe7e65edc665

    I'm not aware of any other package that you could use if you weren't using Spark.

提交回复
热议问题