scala

Spark Streaming - java.lang.NoSuchMethodError Error

我的梦境 提交于 2021-01-28 20:00:30
问题 I am trying to access the streaming tweets from Spark Streaming. This is the software configuration. Ubuntu 14.04.2 LTS scala -version Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL spark-submit --version Spark version 1.6.0 Following is the code. object PrintTweets { def main(args: Array[String]) { // Configure Twitter credentials using twitter.txt setupTwitter() // Set up a Spark streaming context named "PrintTweets" that runs locally using // all CPU cores and one

How do you create a table in Cassandra using phantom for Scala?

允我心安 提交于 2021-01-28 18:11:12
问题 I am trying to run the example on https://github.com/websudos/phantom/blob/develop/phantom-example/src/main/scala/com/websudos/phantom/example/basics/SimpleRecipes.scala ,So I created a Recipe and tried to insert it using insertNewRecord(myRecipe) and got the following exception: ....InvalidQueryException: unconfigured columnfamily my_custom_table . I checked using cqlsh and the keyspace was created but the table was not. So my question is, how do I create the table using phantom? This is

Spark: Dataframe action really slow when upgraded from 2.1.0 to 2.2.1

人走茶凉 提交于 2021-01-28 17:56:20
问题 I just upgraded spark 2.1.0 to spark 2.2.1. Has anyone seen extreme slow behavior on dataframe.filter(…).collect() ?.. specifically a collect operation with filter before. dataframe.collect seems to run okay. However, dataframe.filter(…).collect() takes forever. it contains only 2 records. and its on a unit test. When I go back to spark 2.1.0, its back to normal speed I have looked at the thread dump and could not find an obvious cause. I have made an effort to make sure all the libraries I

Spark: Dataframe action really slow when upgraded from 2.1.0 to 2.2.1

送分小仙女□ 提交于 2021-01-28 17:53:49
问题 I just upgraded spark 2.1.0 to spark 2.2.1. Has anyone seen extreme slow behavior on dataframe.filter(…).collect() ?.. specifically a collect operation with filter before. dataframe.collect seems to run okay. However, dataframe.filter(…).collect() takes forever. it contains only 2 records. and its on a unit test. When I go back to spark 2.1.0, its back to normal speed I have looked at the thread dump and could not find an obvious cause. I have made an effort to make sure all the libraries I

java.lang.NumberFormatException: For input string: “0.000” [duplicate]

主宰稳场 提交于 2021-01-28 14:38:29
问题 This question already has answers here : What is a NumberFormatException and how can I fix it? (9 answers) Closed 1 year ago . I am trying to create a udf to take two strings as parameters; one in DD-MM-YYYY format (e.g. "14-10-2019") and the other in float format (e.g. "0.000"). I want to convert the float-like string to an int and add it to the date object to get another date which I want to return as a string. def getEndDate = udf{ (startDate: String, no_of_days : String) => val num_days =

java.lang.NumberFormatException: For input string: “0.000” [duplicate]

纵饮孤独 提交于 2021-01-28 14:37:03
问题 This question already has answers here : What is a NumberFormatException and how can I fix it? (9 answers) Closed 1 year ago . I am trying to create a udf to take two strings as parameters; one in DD-MM-YYYY format (e.g. "14-10-2019") and the other in float format (e.g. "0.000"). I want to convert the float-like string to an int and add it to the date object to get another date which I want to return as a string. def getEndDate = udf{ (startDate: String, no_of_days : String) => val num_days =

Transforming an iterator into an iterator of chunks of duplicates

耗尽温柔 提交于 2021-01-28 13:47:03
问题 Suppose I am writing a function foo: Iterator[A] => Iterator[List[A]] to transform a given iterator into an iterator of chunks of duplicates : def foo[T](it: Iterator[A]): Iterator[List[A]] = ??? foo("abbbcbbe".iterator).toList.map(_.mkString) // List("a", "bbb", "c", "bb", "e") In order to implement foo I want to reuse function splitDupes: Iterator[A] => (List[A], Iterator[A]) that splits an iterator into a prefix with duplicates and the rest (thanks a lot to Kolmar who suggested it here)

Transforming an iterator into an iterator of chunks of duplicates

心不动则不痛 提交于 2021-01-28 13:43:32
问题 Suppose I am writing a function foo: Iterator[A] => Iterator[List[A]] to transform a given iterator into an iterator of chunks of duplicates : def foo[T](it: Iterator[A]): Iterator[List[A]] = ??? foo("abbbcbbe".iterator).toList.map(_.mkString) // List("a", "bbb", "c", "bb", "e") In order to implement foo I want to reuse function splitDupes: Iterator[A] => (List[A], Iterator[A]) that splits an iterator into a prefix with duplicates and the rest (thanks a lot to Kolmar who suggested it here)

Flink Table API & SQL and map types (Scala)

怎甘沉沦 提交于 2021-01-28 12:42:23
问题 I am using Flink's Table API and/or Flink's SQL support (Flink 1.3.1, Scala 2.11) in a streaming environment. I'm starting with a DataStream[Person] , and Person is a case class that looks like: Person(name: String, age: Int, attributes: Map[String, String]) All is working as expected until I start to bring attributes into the picture. For example: val result = streamTableEnvironment.sql( """ |SELECT |name, |attributes['foo'], |TUMBLE_START(rowtime, INTERVAL '1' MINUTE) |FROM myTable |GROUP

sbt/sbt : no such file or directory error

爱⌒轻易说出口 提交于 2021-01-28 12:42:12
问题 I'm trying to install spark in my ubuntu machine. I have installed sbt and scala. I'm able to view their versions. But, when I try to install spark using 'sbt/sbt assembly' command, i get the below error. 'bash: sbt/sbt: No such file or directory' Can you please let me know where I am making a mistake. I have been stuck here since yesterday. Thank you for the help in advance. 回答1: You may had downloaded the pre-built version of Spark. If its a pre-built you dont need to execute built tool