How do I read a parquet in PySpark written from Spark?

前端 未结 2 964
无人及你
无人及你 2021-01-31 03:36

I am using two Jupyter notebooks to do different things in an analysis. In my Scala notebook, I write some of my cleaned data to parquet:

partitionedDF.select(\         


        
2条回答
  •  你的背包
    2021-01-31 04:11

    I read parquet file in the following way:

    from pyspark.sql import SparkSession
    # initialise sparkContext
    spark = SparkSession.builder \
        .master('local') \
        .appName('myAppName') \
        .config('spark.executor.memory', '5gb') \
        .config("spark.cores.max", "6") \
        .getOrCreate()
    
    sc = spark.sparkContext
    
    # using SQLContext to read parquet file
    from pyspark.sql import SQLContext
    sqlContext = SQLContext(sc)
    
    # to read parquet file
    df = sqlContext.read.parquet('path-to-file/commentClusters.parquet')
    

提交回复
热议问题