scala

Add a column to spark dataframe which contains list of all column names of the current row whose value is not null

筅森魡賤 提交于 2021-02-11 14:55:34
问题 Hi I want to add a new column to a dafaframe which contains the list of all column names(for that row) which are not null. How do I achieve this in Scala. Please help. val baseDF = Seq( (3, "California", "name1", 9846, null, "SFO"), (1, "Oregon", "name2", 9847, null, null), (2, null, null, null, null, null) ).toDF("emp_id", "emp_city", "emp_name", "emp_phone", "emp_sal", "emp_site") Expected output is new column named "NonNullColumns" with expected non null column names for each row:

Writing to a PostgreSQL [CIDR] column with Spark JDBC

不羁岁月 提交于 2021-02-11 14:34:14
问题 I'm trying to write a Spark 2.4.4 dataframe to PostgreSQL via JDBC. I'm using Scala. batchDF. write. format("jdbc"). option("url", "jdbc:postgresql://..."). option("driver", "org.postgresql.Driver"). option("dbtable", "traffic_info"). option("user", "xxxx"). option("password", "xxxx"). mode(SaveMode.Append). save() One of the fields ( remote_prefix ) is of CIDR type in my table but is StringType in my dataframe, so I cannot write it as-is: ERROR: column "remote_prefix" is of type cidr but

How to subtract vector from scalar in scala?

与世无争的帅哥 提交于 2021-02-11 14:20:00
问题 I have parquet file which contain two columns (id,features).I want to subtract features from scalar and divide output by another scalar. parquet file df.withColumn("features", ((df("features")-constant1)/constant2)) but give me error requirement failed: The number of columns doesn't match. Old column names (2): id, features New column names (1): features How to solve it? 回答1: My scala spark code to this as below . Only way to do any operation on vector sparkm datatype is casting to string.

Generate a custom pattern number sequence in one go

五迷三道 提交于 2021-02-11 13:59:45
问题 I'd like to generate the following number sequence in one go using a functional initialization construct: Array(0, 0, 0, 0, 3, 3, 6, 6, 9, 9, ..., n*3, n*3) One way is to do: Array.fill[Int](2)(0) ++ Array.tabulate(4)(_*3) but I'd need to double each value of the second part of the construct i.e. to get 0, 0 then 3, 3 etc. How can I duplicate the values of the second construct? I also couldn't figure out a mathematical function that would generate such sequence. 回答1: Consider tail-recursive

Generate a custom pattern number sequence in one go

这一生的挚爱 提交于 2021-02-11 13:58:59
问题 I'd like to generate the following number sequence in one go using a functional initialization construct: Array(0, 0, 0, 0, 3, 3, 6, 6, 9, 9, ..., n*3, n*3) One way is to do: Array.fill[Int](2)(0) ++ Array.tabulate(4)(_*3) but I'd need to double each value of the second part of the construct i.e. to get 0, 0 then 3, 3 etc. How can I duplicate the values of the second construct? I also couldn't figure out a mathematical function that would generate such sequence. 回答1: Consider tail-recursive

Scala Traits and Inheritance

大兔子大兔子 提交于 2021-02-11 13:37:55
问题 I have my Scala classes structured as below: trait ActualClass extends ParentClass { override def method1(inputStr:String):String = { "actual "+ inputStr } def execute():String = { this.method1("test") } } trait WithClass extends ParentClass { override def method1(inputStr:String):String = { "with "+ inputStr } } class ParentClass { def method1(inputStr:String):String = { "parent "+ inputStr } } object TestClass extends App{ val actualClass = new ActualClass with WithClass { } println

How to parse dynamic Json with dynamic keys inside it in Scala

一个人想着一个人 提交于 2021-02-11 12:56:58
问题 I am trying to parse Json structure which is dynamic in nature and load into database. But facing difficulty where json has dynamic keys inside it. Below is my sample json: Have tried using explode function but didn't help. moslty similar thing is described here How to parse a dynamic JSON key in a Nested JSON result? { "_id": { "planId": "5f34dab0c661d8337097afb9", "version": { "$numberLong": "1" }, "period": { "name" : "3Q20", "startDate": 20200629, "endDate": 20200927 }, "line": "b443e9c0

Running scala in cmd makes i look like I am missing 'build.sbt'

我与影子孤独终老i 提交于 2021-02-11 12:53:27
问题 I'm trying to run Scala in my command line. I checked my java, went to the Scala website, downloaded and installed it, updated my environment variables. So far the only thing different from guides online is that the folder where sbt is installed does not include a "lib" folder. I then run sbt command in my prompt, and I get this message: It looks like I'm missing a file called build.sbt , what is this? and do i need it? Edit: If I press 'continue' on the picture above, I get sbt:scalaproj>

Count of values in a row in spark dataframe using scala

笑着哭i 提交于 2021-02-11 12:52:24
问题 I have a dataframe. It contains the amount of sales for different items across different sales outlets. The dataframe shown below only shows few of the items across few sales outlets. There's a bench mark of 100 items per day sale for each item. For each item that's sold more than 100, it is marked as "Yes" and those below 100 is marked as "No" val df1 = Seq( ("Mumbai", 90, 109, , 101, 78, ............., "No", "Yes", "Yes", "No", .....), ("Singapore", 149, 129, , 201, 107, ............., "Yes

Spark - Scope, Data Frame, and memory management

空扰寡人 提交于 2021-02-11 12:41:43
问题 I am curious about how scope works with Data Frame and Spark. In the example below, I have a list of file, each independently loaded in a Data Frame, some operation is performed, then, we write dfOutput to disk. val files = getListOfFiles("outputs/emailsSplit") for (file <- files){ val df = sqlContext.read .format("com.databricks.spark.csv") .option("delimiter","\t") // Delimiter is tab .option("parserLib", "UNIVOCITY") // Parser, which deals better with the email formatting .schema