Load Data using Apache-Spark on AWS

前端 未结 2 2078
攒了一身酷
攒了一身酷 2021-01-28 17:31

I am using Apache-Spark on Amazon Web Service (AWS)-EC2 to load and process data. I\'ve created one master and two slave nodes. On the master node, I have a directory data

2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-28 17:58

    One quick suggestion is to load csv from S3 instead of having it in local.

    Here is a sample scala snippet which can be used to load a bucket from S3

    val csvs3Path = "s3n://REPLACE_WITH_YOUR_ACCESS_KEY:REPLACE_WITH_YOUR_SECRET_KEY@REPLACE_WITH_YOUR_S3_BUCKET"
    val dataframe = sqlContext.
                        read.
                        format("com.databricks.spark.csv").
                        option("header", "true").
                        load(leadsS3Path)
    

提交回复
热议问题