I am very new to Pyspark. I tried parsing the JSON file using the following code
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlConte
Spark >= 2.2:
You can use multiLine argument for JSON reader:
spark.read.json(path_to_input, multiLine=True)
Spark < 2.2
There is almost universal, but rather expensive solution, which can be used to read multiline JSON files:
SparkContex.wholeTextFiles.DataFrameReader.json.As long as there are no other problems with your data it should do the trick:
spark.read.json(sc.wholeTextFiles(path_to_input).values())