scala | 易学教程

Csv Data is not loading properly as Parquet using Spark

阅读更多关于 Csv Data is not loading properly as Parquet using Spark

问题 I have a table in Hive CREATE TABLE tab_data ( rec_id INT, rec_name STRING, rec_value DECIMAL(3,1), rec_created TIMESTAMP ) STORED AS PARQUET; and I want to populate this table with data in .csv files like these 10|customer1|10.0|2016-09-07 08:38:00.0 20|customer2|24.0|2016-09-08 10:45:00.0 30|customer3|35.0|2016-09-10 03:26:00.0 40|customer1|46.0|2016-09-11 08:38:00.0 50|customer2|55.0|2016-09-12 10:45:00.0 60|customer3|62.0|2016-09-13 03:26:00.0 70|customer1|72.0|2016-09-14 08:38:00.0 80

Compile error when using a companion object of a case class as a type parameter

阅读更多关于 Compile error when using a companion object of a case class as a type parameter

问题 I'm create a number of json messages for spray in scala using case classes. For example: case class Foo(name: String, attrs: List[String]) implicit val fooFormat = jsonFormat2(Foo) object Foo { case class Invalid(error: String) } case class Bar(name: String, kv: Map[String, String]) implicit val barFormat = jsonFormat2(Bar) In the above snippet, barFormat compiles, but fooFormat does not: type mismatch; found : Foo.type required: (?, ?) => ? Note: implicit value barFormat is not applicable

Mockito's Answer in ScalaTest

阅读更多关于 Mockito's Answer in ScalaTest

问题 Is there some alternative to Mockito's Answer in ScalaTest? I was going through its documentation, but didn't find anything. I would like to, for example, execute some logic on arguments of a stubbed method. In Mockito, I would do something like this: when(mock.create(any(A.class))).thenAnswer(new Answer() { Object answer(InvocationOnMock invocation) { A firstArg = (A) invocation.getArguments()[0]; firstArg.callMethod(); return null; } }); In ScalaTest, I'm fine with using Mockito, as well.

Difference between F[_] and F[T] In Scala when used in type constructors

阅读更多关于 Difference between F[_] and F[T] In Scala when used in type constructors

问题 This question is about _ as used in type constructor and not when used in defining existential types. So the question is what is the difference when _ is used as type parameter instead of a variable like T . For example difference between F[_] and F[T] . The only difference I can think of is that with F[_] the parameter itself can have as many holes as possible...that is F[_] can become F[Int] or F[Future[Option[Int]]] etc...while when you have F[T] the T can only be a proper type...that is F

How to use double pipe as delimiter in CSV?

阅读更多关于 How to use double pipe as delimiter in CSV?

问题 Spark 1.5 and Scala 2.10.6 I have a data file that is using "¦¦" as the delimiter. I am having a hard time parsing through this to create a data frame. Can multiple delimiters be used to create a data frame? The code works with a single broken pipe but not with multiple delimiters. My Code: val customSchema_1 = StructType(Array( StructField("ID", StringType, true), StructField("FILLER", StringType, true), StructField("CODE", StringType, true))); val df_1 = sqlContext.read .format("com

Get type of a “singleton type”

阅读更多关于 Get type of a “singleton type”

问题 We can create a literal types via shapeless: import shapeless.syntax.singleton._ var x = 42.narrow // x: Int(42) = 42 But how can I operate with Int(42) as a type if it's even impossible to create type alias type Answ = Int(42) // won't compile // or def doSmth(value: Int(42)) = ... // won't compile 回答1: 1) In Typelevel Scala you can write just val x: 42 = 42 type Answ = 42 def doSmth(value: 42) = ??? 2) In Dotty Scala you can write the same. 3) In Lightbend Scala (i.e. standard Scala) +

Spark: Could not find CoarseGrainedScheduler

阅读更多关于 Spark: Could not find CoarseGrainedScheduler

问题 Am not sure what's causing this exception running my Spark job after running for some few hours. Am running Spark 2.0.2 Any debugging tip ? 2016-12-27 03:11:22,199 [shuffle-server-3] ERROR org.apache.spark.network.server.TransportRequestHandler - Error while invoking RpcHandler#receive() for one-way message. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154) at org.apache.spark.rpc.netty.Dispatcher

How can I easily get a Scala case class's name?

阅读更多关于 How can I easily get a Scala case class's name?

问题 Given: case class FirstCC { def name: String = ... // something that will give "FirstCC" } case class SecondCC extends FirstCC val one = FirstCC() val two = SecondCC() How can I get "FirstCC" from one.name and "SecondCC" from two.name ? 回答1: def name = this.getClass.getName Or if you want only the name without the package: def name = this.getClass.getSimpleName See the documentation of java.lang.Class for more information. 回答2: You can use the property productPrefix of the case class: case

In scala, are there any condition where implicit view won't be able to propagate to other implicit function?

阅读更多关于 In scala, are there any condition where implicit view won't be able to propagate to other implicit function?

问题 Assuming that A class called 'summoner' was defined, that is capable of summoning implicit views from the scope: case class Summoner[R]() { def summon[T](v: T)(implicit ev: T => R): R = ev(v) } I found that it works most of the time, but there are cases where it doesn't work, e.g. the following is a (not too) short case which uses the singleton-ops library: import shapeless.Witness import singleton.ops.+ import singleton.ops.impl.Op trait Operand { def +[ X >: this.type <: Operand, Y <:

.net core kafka 入门实例一篇看懂

阅读更多关于 .net core kafka 入门实例一篇看懂

.net core kafka 入门实例一篇看懂 kafka 相信都有听说过，不管有没有用过，在江湖上可以说是大名鼎鼎，就像天龙八部里的乔峰。国际惯例，先介绍生平事迹简介 Kafka 是由 Apache软件基金会开发的一个开源流处理平台，由 Scala 和 Java 编写。Kafka是一种高吞吐量的分布式，支持分区（partition），多副本（replica）的发布订阅消息系统。与其他MQ最大不同是Topic 具有分区（Partition）的概念，消息出队的速度也比其他MQ快。特性及适用场景高吞吐量、低延迟可扩展性：集群支持热扩展持久性、可靠性容错性：允许集群中节点失败（若副本数量为n,则允许n-1个节点失败）高并发：支持数千个客户端同时读写常用场景日志收集消息系统：生产者和消费者、缓存消息等。用户活动跟踪：流网页、搜索、点击等活动运营指标工作流处理对实时性要求不高的数据处理 Kafka基础概念 Topic Kafka 中可将消息分类，每一类的消息称为一个 Topic(主题)，消费者可以对不同的 Topic 进行不同的处理。Topic相当于传统消息系统MQ中的一个队列queue，producer端发送的message必须指定是发送到哪个topic，但是不需要指定topic下的哪个partition