How to create a DataFrame from a text file in Spark

后端 未结 8 1140
滥情空心
滥情空心 2021-01-31 19:03

I have a text file on HDFS and I want to convert it to a Data Frame in Spark.

I am using the Spark Context to load the file and then try to generate individual columns f

8条回答
  •  执念已碎
    2021-01-31 19:51

    I have given different ways to create DataFrame from text file

    val conf = new SparkConf().setAppName(appName).setMaster("local")
    val sc = SparkContext(conf)
    

    raw text file

    val file = sc.textFile("C:\\vikas\\spark\\Interview\\text.txt")
    val fileToDf = file.map(_.split(",")).map{case Array(a,b,c) => 
    (a,b.toInt,c)}.toDF("name","age","city")
    fileToDf.foreach(println(_))
    

    spark session without schema

    import org.apache.spark.sql.SparkSession
    val sparkSess = 
    SparkSession.builder().appName("SparkSessionZipsExample")
    .config(conf).getOrCreate()
    
    val df = sparkSess.read.option("header", 
    "false").csv("C:\\vikas\\spark\\Interview\\text.txt")
    df.show()
    

    spark session with schema

    import org.apache.spark.sql.types._
    val schemaString = "name age city"
    val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, 
    StringType, nullable=true))
    val schema = StructType(fields)
    
    val dfWithSchema = sparkSess.read.option("header", 
    "false").schema(schema).csv("C:\\vikas\\spark\\Interview\\text.txt")
    dfWithSchema.show()
    

    using sql context

    import org.apache.spark.sql.SQLContext
    
    val fileRdd = 
    sc.textFile("C:\\vikas\\spark\\Interview\\text.txt").map(_.split(",")).map{x 
    => org.apache.spark.sql.Row(x:_*)}
    val sqlDf = sqlCtx.createDataFrame(fileRdd,schema)
    sqlDf.show()
    

提交回复
热议问题