apache-spark | 易学教程

How to know whether Spark is running in standalone mode or running on Yarn?

阅读更多关于 How to know whether Spark is running in standalone mode or running on Yarn?

问题 In my cluster sever, Spark is already being deployed. (Someone has set ip up and left quite a long time ago) I want to know whether Spark is running in standalone mode or running on Yarn. How can I check it? 回答1: If you have access to the Spark UI - navigate to the "Environment" tab and search for "master" configuration: If it says yarn - it's running on YARN... if it shows a URL of the form spark://... it's a standalone cluster. 来源： https://stackoverflow.com/questions/41712743/how-to-know

How to know whether Spark is running in standalone mode or running on Yarn?

阅读更多关于 How to know whether Spark is running in standalone mode or running on Yarn?

How to know whether Spark is running in standalone mode or running on Yarn?

阅读更多关于 How to know whether Spark is running in standalone mode or running on Yarn?

How to solve “Can't assign requested address: Service 'sparkDriver' failed after 16 retries” when running spark code?

阅读更多关于 How to solve “Can't assign requested address: Service 'sparkDriver' failed after 16 retries” when running spark code?

问题 I am learning spark + scala with intelliJ , started with below small piece of code import org.apache.spark.{SparkConf, SparkContext} object ActionsTransformations { def main(args: Array[String]): Unit = { //Create a SparkContext to initialize Spark val conf = new SparkConf() conf.setMaster("local") conf.setAppName("Word Count") val sc = new SparkContext(conf) val numbersList = sc.parallelize(1.to(10000).toList) println(numbersList) } } when trying to run , getting below exception Exception in

Object not serializable (org.apache.kafka.clients.consumer.ConsumerRecord) in Java spark kafka streaming

阅读更多关于 Object not serializable (org.apache.kafka.clients.consumer.ConsumerRecord) in Java spark kafka streaming

问题 I am pretty sure that I am pushing data only string and deserialize also as String. Record I pushed it is showing in error also. But why suddenly it is showing such type of error, Is there anything I am missing? Here is below code, import java.util.HashMap; import java.util.HashSet; import java.util.Arrays; import java.util.Collection; import java.util.Iterator; import java.util.Map; import java.util.Set; import java.util.concurrent.atomic.AtomicReference; import java.util.regex.Pattern;

Object not serializable (org.apache.kafka.clients.consumer.ConsumerRecord) in Java spark kafka streaming

阅读更多关于 Object not serializable (org.apache.kafka.clients.consumer.ConsumerRecord) in Java spark kafka streaming

Object not serializable (org.apache.kafka.clients.consumer.ConsumerRecord) in Java spark kafka streaming

阅读更多关于 Object not serializable (org.apache.kafka.clients.consumer.ConsumerRecord) in Java spark kafka streaming

How to delete rows in a table created from a Spark dataframe?

阅读更多关于 How to delete rows in a table created from a Spark dataframe?

问题 Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error: pyspark.sql.utils.ParseException: u"\nmissing 'FROM' at 'a'(line 2, pos 23)\n\n== SQL ==\n\n DELETE a.* FROM adsquare a \n-----------------------^^^\n" These is the script that I'm using: sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate() adsquare = sq.read.csv(f, schema

How to convert map to dataframe?

阅读更多关于 How to convert map to dataframe?

问题 m is a map as following: scala> m res119: scala.collection.mutable.Map[Any,Any] = Map(A-> 0.11164610291904906, B-> 0.11856755943424617, C -> 0.1023171832681312) I want to get: name score A 0.11164610291904906 B 0.11856755943424617 C 0.1023171832681312 How to get the final dataframe? 回答1: First covert it to a Seq , then you can use the toDF() function. val spark = SparkSession.builder.getOrCreate() import spark.implicits._ val m = Map("A"-> 0.11164610291904906, "B"-> 0.11856755943424617, "C" -

Cassandra Error message: Not marking nodes down due to local pause. Why?

阅读更多关于 Cassandra Error message: Not marking nodes down due to local pause. Why?

问题 I have 6 nodes, 1 Solr, 5 Spark nodes, using datastax. My cluster is on a similar server to Amazon EC2, with EBS volume. Each node has 3 EBS volumes, which compose a logical data disk using LVM. In my OPS center the same node frequently becomes unresponsive, which leads to a connect time out of my data system. My data amount is around 400GB with 3 replicas. I have 20 streaming jobs with batch interval every minute. Here is my error message: /var/log/cassandra/output.log:WARN 13:44:31,868 Not