scala | 易学教程

Add a column to spark dataframe which contains list of all column names of the current row whose value is not null

阅读更多关于 Add a column to spark dataframe which contains list of all column names of the current row whose value is not null

问题 Hi I want to add a new column to a dafaframe which contains the list of all column names(for that row) which are not null. How do I achieve this in Scala. Please help. val baseDF = Seq( (3, "California", "name1", 9846, null, "SFO"), (1, "Oregon", "name2", 9847, null, null), (2, null, null, null, null, null) ).toDF("emp_id", "emp_city", "emp_name", "emp_phone", "emp_sal", "emp_site") Expected output is new column named "NonNullColumns" with expected non null column names for each row:

Writing to a PostgreSQL [CIDR] column with Spark JDBC

阅读更多关于 Writing to a PostgreSQL [CIDR] column with Spark JDBC

问题 I'm trying to write a Spark 2.4.4 dataframe to PostgreSQL via JDBC. I'm using Scala. batchDF. write. format("jdbc"). option("url", "jdbc:postgresql://..."). option("driver", "org.postgresql.Driver"). option("dbtable", "traffic_info"). option("user", "xxxx"). option("password", "xxxx"). mode(SaveMode.Append). save() One of the fields ( remote_prefix ) is of CIDR type in my table but is StringType in my dataframe, so I cannot write it as-is: ERROR: column "remote_prefix" is of type cidr but

How to subtract vector from scalar in scala?

阅读更多关于 How to subtract vector from scalar in scala?

问题 I have parquet file which contain two columns (id,features).I want to subtract features from scalar and divide output by another scalar. parquet file df.withColumn("features", ((df("features")-constant1)/constant2)) but give me error requirement failed: The number of columns doesn't match. Old column names (2): id, features New column names (1): features How to solve it? 回答1: My scala spark code to this as below . Only way to do any operation on vector sparkm datatype is casting to string.

Generate a custom pattern number sequence in one go

阅读更多关于 Generate a custom pattern number sequence in one go

问题 I'd like to generate the following number sequence in one go using a functional initialization construct: Array(0, 0, 0, 0, 3, 3, 6, 6, 9, 9, ..., n*3, n*3) One way is to do: Array.fill[Int](2)(0) ++ Array.tabulate(4)(_*3) but I'd need to double each value of the second part of the construct i.e. to get 0, 0 then 3, 3 etc. How can I duplicate the values of the second construct? I also couldn't figure out a mathematical function that would generate such sequence. 回答1: Consider tail-recursive

Generate a custom pattern number sequence in one go

阅读更多关于 Generate a custom pattern number sequence in one go

Scala Traits and Inheritance

阅读更多关于 Scala Traits and Inheritance

问题 I have my Scala classes structured as below: trait ActualClass extends ParentClass { override def method1(inputStr:String):String = { "actual "+ inputStr } def execute():String = { this.method1("test") } } trait WithClass extends ParentClass { override def method1(inputStr:String):String = { "with "+ inputStr } } class ParentClass { def method1(inputStr:String):String = { "parent "+ inputStr } } object TestClass extends App{ val actualClass = new ActualClass with WithClass { } println

How to parse dynamic Json with dynamic keys inside it in Scala

阅读更多关于 How to parse dynamic Json with dynamic keys inside it in Scala

问题 I am trying to parse Json structure which is dynamic in nature and load into database. But facing difficulty where json has dynamic keys inside it. Below is my sample json: Have tried using explode function but didn't help. moslty similar thing is described here How to parse a dynamic JSON key in a Nested JSON result? { "_id": { "planId": "5f34dab0c661d8337097afb9", "version": { "$numberLong": "1" }, "period": { "name" : "3Q20", "startDate": 20200629, "endDate": 20200927 }, "line": "b443e9c0

Running scala in cmd makes i look like I am missing 'build.sbt'

阅读更多关于 Running scala in cmd makes i look like I am missing 'build.sbt'

问题 I'm trying to run Scala in my command line. I checked my java, went to the Scala website, downloaded and installed it, updated my environment variables. So far the only thing different from guides online is that the folder where sbt is installed does not include a "lib" folder. I then run sbt command in my prompt, and I get this message: It looks like I'm missing a file called build.sbt , what is this? and do i need it? Edit: If I press 'continue' on the picture above, I get sbt:scalaproj>

Count of values in a row in spark dataframe using scala

阅读更多关于 Count of values in a row in spark dataframe using scala

问题 I have a dataframe. It contains the amount of sales for different items across different sales outlets. The dataframe shown below only shows few of the items across few sales outlets. There's a bench mark of 100 items per day sale for each item. For each item that's sold more than 100, it is marked as "Yes" and those below 100 is marked as "No" val df1 = Seq( ("Mumbai", 90, 109, , 101, 78, ............., "No", "Yes", "Yes", "No", .....), ("Singapore", 149, 129, , 201, 107, ............., "Yes

Spark - Scope, Data Frame, and memory management

阅读更多关于 Spark - Scope, Data Frame, and memory management

问题 I am curious about how scope works with Data Frame and Spark. In the example below, I have a list of file, each independently loaded in a Data Frame, some operation is performed, then, we write dfOutput to disk. val files = getListOfFiles("outputs/emailsSplit") for (file <- files){ val df = sqlContext.read .format("com.databricks.spark.csv") .option("delimiter","\t") // Delimiter is tab .option("parserLib", "UNIVOCITY") // Parser, which deals better with the email formatting .schema