scala

Csv Data is not loading properly as Parquet using Spark

孤街浪徒 提交于 2020-08-25 03:42:27
问题 I have a table in Hive CREATE TABLE tab_data ( rec_id INT, rec_name STRING, rec_value DECIMAL(3,1), rec_created TIMESTAMP ) STORED AS PARQUET; and I want to populate this table with data in .csv files like these 10|customer1|10.0|2016-09-07 08:38:00.0 20|customer2|24.0|2016-09-08 10:45:00.0 30|customer3|35.0|2016-09-10 03:26:00.0 40|customer1|46.0|2016-09-11 08:38:00.0 50|customer2|55.0|2016-09-12 10:45:00.0 60|customer3|62.0|2016-09-13 03:26:00.0 70|customer1|72.0|2016-09-14 08:38:00.0 80

Compile error when using a companion object of a case class as a type parameter

我是研究僧i 提交于 2020-08-24 08:42:06
问题 I'm create a number of json messages for spray in scala using case classes. For example: case class Foo(name: String, attrs: List[String]) implicit val fooFormat = jsonFormat2(Foo) object Foo { case class Invalid(error: String) } case class Bar(name: String, kv: Map[String, String]) implicit val barFormat = jsonFormat2(Bar) In the above snippet, barFormat compiles, but fooFormat does not: type mismatch; found : Foo.type required: (?, ?) => ? Note: implicit value barFormat is not applicable

Mockito's Answer in ScalaTest

本小妞迷上赌 提交于 2020-08-24 06:44:09
问题 Is there some alternative to Mockito's Answer in ScalaTest? I was going through its documentation, but didn't find anything. I would like to, for example, execute some logic on arguments of a stubbed method. In Mockito, I would do something like this: when(mock.create(any(A.class))).thenAnswer(new Answer() { Object answer(InvocationOnMock invocation) { A firstArg = (A) invocation.getArguments()[0]; firstArg.callMethod(); return null; } }); In ScalaTest, I'm fine with using Mockito, as well.

Difference between F[_] and F[T] In Scala when used in type constructors

↘锁芯ラ 提交于 2020-08-22 09:39:06
问题 This question is about _ as used in type constructor and not when used in defining existential types. So the question is what is the difference when _ is used as type parameter instead of a variable like T . For example difference between F[_] and F[T] . The only difference I can think of is that with F[_] the parameter itself can have as many holes as possible...that is F[_] can become F[Int] or F[Future[Option[Int]]] etc...while when you have F[T] the T can only be a proper type...that is F

How to use double pipe as delimiter in CSV?

落爺英雄遲暮 提交于 2020-08-22 05:45:40
问题 Spark 1.5 and Scala 2.10.6 I have a data file that is using "¦¦" as the delimiter. I am having a hard time parsing through this to create a data frame. Can multiple delimiters be used to create a data frame? The code works with a single broken pipe but not with multiple delimiters. My Code: val customSchema_1 = StructType(Array( StructField("ID", StringType, true), StructField("FILLER", StringType, true), StructField("CODE", StringType, true))); val df_1 = sqlContext.read .format("com

Get type of a “singleton type”

橙三吉。 提交于 2020-08-22 05:21:30
问题 We can create a literal types via shapeless: import shapeless.syntax.singleton._ var x = 42.narrow // x: Int(42) = 42 But how can I operate with Int(42) as a type if it's even impossible to create type alias type Answ = Int(42) // won't compile // or def doSmth(value: Int(42)) = ... // won't compile 回答1: 1) In Typelevel Scala you can write just val x: 42 = 42 type Answ = 42 def doSmth(value: 42) = ??? 2) In Dotty Scala you can write the same. 3) In Lightbend Scala (i.e. standard Scala) +

Spark: Could not find CoarseGrainedScheduler

浪子不回头ぞ 提交于 2020-08-21 02:27:56
问题 Am not sure what's causing this exception running my Spark job after running for some few hours. Am running Spark 2.0.2 Any debugging tip ? 2016-12-27 03:11:22,199 [shuffle-server-3] ERROR org.apache.spark.network.server.TransportRequestHandler - Error while invoking RpcHandler#receive() for one-way message. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154) at org.apache.spark.rpc.netty.Dispatcher

How can I easily get a Scala case class's name?

荒凉一梦 提交于 2020-08-20 19:16:26
问题 Given: case class FirstCC { def name: String = ... // something that will give "FirstCC" } case class SecondCC extends FirstCC val one = FirstCC() val two = SecondCC() How can I get "FirstCC" from one.name and "SecondCC" from two.name ? 回答1: def name = this.getClass.getName Or if you want only the name without the package: def name = this.getClass.getSimpleName See the documentation of java.lang.Class for more information. 回答2: You can use the property productPrefix of the case class: case

In scala, are there any condition where implicit view won't be able to propagate to other implicit function?

核能气质少年 提交于 2020-08-20 10:35:27
问题 Assuming that A class called 'summoner' was defined, that is capable of summoning implicit views from the scope: case class Summoner[R]() { def summon[T](v: T)(implicit ev: T => R): R = ev(v) } I found that it works most of the time, but there are cases where it doesn't work, e.g. the following is a (not too) short case which uses the singleton-ops library: import shapeless.Witness import singleton.ops.+ import singleton.ops.impl.Op trait Operand { def +[ X >: this.type <: Operand, Y <:

.net core kafka 入门实例 一篇看懂

本秂侑毒 提交于 2020-08-20 09:31:56
.net core kafka 入门实例 一篇看懂 kafka 相信都有听说过,不管有没有用过,在江湖上可以说是大名鼎鼎,就像天龙八部里的乔峰。国际惯例,先介绍生平事迹 简介 Kafka 是由 Apache软件基金会 开发的一个开源流处理平台,由 Scala 和 Java 编写。Kafka是一种高吞吐量的 分布式 ,支持分区(partition),多副本(replica)的 发布订阅消息系统 。与其他MQ最大不同是Topic 具有分区(Partition)的概念,消息出队的速度也比其他MQ快。 特性及适用场景 高吞吐量、低延迟 可扩展性:集群支持热扩展 持久性、可靠性 容错性:允许集群中节点失败(若副本数量为n,则允许n-1个节点失败) 高并发:支持数千个客户端同时读写 常用场景 日志收集 消息系统:生产者和消费者、缓存消息等。 用户活动跟踪:流网页、搜索、点击等活动 运营指标 工作流处理 对实时性要求不高的数据处理 Kafka基础概念 Topic Kafka 中可将消息分类,每一类的消息称为一个 Topic(主题),消费者可以对不同的 Topic 进行不同的处理。Topic相当于传统消息系统MQ中的一个队列queue,producer端发送的message必须指定是发送到哪个topic,但是不需要指定topic下的哪个partition