PySpark Kafka Error: Missing application resource

谁说胖子不能爱 提交于 2020-06-29 03:55:08

问题


Below error is triggered when i added the below dependency to the code,

'--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'

Below is the code,

from pyspark.sql import SparkSession, Row
from pyspark.context import SparkContext
from kafka import KafkaConsumer
import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'


sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

df = spark \
  .read \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092") \
  .option("subscribe", "Jim_Topic") \
  .load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

Below is the error,

Error: Missing application resource.

Usage: spark-submit [options] [app arguments] Usage: spark-submit --kill [submission ID] --master [spark://...] Usage: spark-submit --status [submission ID] --master [spark://...] Usage: spark-submit run-example [options] example-class [example args]


回答1:


You also need to provide the name of your python file.

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1 your_python_file.py'

Alternatively, a nicer way would be:

conf = SparkConf().set("spark.jars", "/path/to/your/jar")
sc = SparkContext(conf=conf)


来源:https://stackoverflow.com/questions/62348338/pyspark-kafka-error-missing-application-resource

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!