SparkContext,
JavaSparkContext,
SQLContext
and SparkSession
?
I will talk about Spark version 2.x only.
SparkSession: It's a main entry point of your spark Application. To run any code on your spark, this is the first thing you should create.
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName("Word Count")\
.config("spark.some.config.option", "some-value")\
.getOrCreate()
SparkContext: It's a inner Object (property) of SparkSession. It's used to interact with Low-Level API
Through SparkContext
you can create RDD
, accumlator
and Broadcast variables
.
for most cases you won't need SparkContext
. You can get SparkContext
from SparkSession
val sc = spark.sparkContext