user-defined-functions

DataFrame user-defined function not applied unless I change column name

大城市里の小女人 提交于 2019-12-13 17:43:54
问题 I want to convert my DataFrame column using implicits functions definition. I have my DataFrame type defined, which contains additional functions: class MyDF(df: DataFrame) { def bytes2String(colName: String): DataFrame = df .withColumn(colname + "_tmp", udf((x: Array[Byte]) => bytes2String(x)).apply(col(colname))) .drop(colname) .withColumnRenamed(colname + "_tmp", colname) } Then I define my implicit conversion class: object NpDataFrameImplicits { implicit def toNpDataFrame(df: DataFrame):

How to fetch records set with a ttl of -1 in aerospike?

我与影子孤独终老i 提交于 2019-12-13 16:23:52
问题 I have so many records in aerospike, i want to fetch the records whose ttl is -1 please provide solution 回答1: Just to clarify, setting a TTL of -1 in the client means never expire (equivalent to a default-ttl of 0 in the server's aerospike.conf file), while setting a TTL of 0 in the client means inherit the default-ttl for this namespace . With Predicate Filtering: If you're using the Java, C, C# and Go clients the easiest way to identify the records with a void time of 0 would be to use a

PySpark - Adding a Column from a list of values using a UDF

99封情书 提交于 2019-12-13 14:51:20
问题 I have to add column to a PySpark dataframe based on a list of values. a= spark.createDataFrame([("Dog", "Cat"), ("Cat", "Dog"), ("Mouse", "Cat")],["Animal", "Enemy"]) I have a list called rating, which is a rating of each pet. rating = [5,4,1] I need to append the dataframe with a column called Rating, such that +------+-----+------+ |Animal|Enemy|Rating| +------+-----+------+ | Dog| Cat| 5| | Cat| Dog| 4| | Mouse| Cat| 1| +------+-----+------+ I have done the following however it is

Multi-part identifier could not be bound

谁说胖子不能爱 提交于 2019-12-13 14:09:41
问题 I know that there are several questions around this exception on SO, but nothing seen that helps me. I have following query giving me a "Multi-part identifier 'claim.fiData' could not be bound" -Exception: SELECT claim.idData FROM tabData as claim INNER JOIN dbo._previousClaimsByFiData(claim.fiData) AS prevClaim ON prevClaim.idData=claim.fiData GROUP BY claim.idData HAVING(prevClaim.fiMaxActionCode IN (8, 23, 24) and prevClaim.Repair_Completion_Date >= DATEADD(day,-90,prevClaim.Repair

Can Excel Conditional Formatting use UDFs in the condition?

混江龙づ霸主 提交于 2019-12-13 13:31:43
问题 I have a cell in Excel that I want to format differently based on a user defined formula (UDF) - my formula tests whether there is a formula in the cell... I am trying to use conditional formatting with my UDF to format the cell - but it does not seem to be working. My condition is this: ="isManualPrice(R22C12)" I tried without the quotes, but get the error You cannot use references to other worksheets or workbooks for Conditional Formatting criteria Perhaps the issue relates to my UDF being

Disadvantages of using a lot of parameters

£可爱£侵袭症+ 提交于 2019-12-13 13:22:43
问题 I am re-writing some code to make functional changes and I am stuck at a situation where either I will need to overload a function to accommodate two or three types of parameters (but performing almost identical operations on them) OR use one function with a lot of parameters. Now I am going with the latter option, and I just wanted to know specific disadvantages (if any) of using a function with a lot of parameters (and when I say lot, I mean 15). I am looking for a general answer, nothing

Custom aggregation on PySpark dataframes

折月煮酒 提交于 2019-12-13 12:07:54
问题 I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]] I want the output as row: ["1234", [ 1 1 0]] so the vector is a sum of all vectors grouped by userid . How can I achieve this? PySpark sum aggregate operation does not support the vector addition. 回答1: You have several options: Create a user defined aggregate

How call method based on Json Object scala spark?

倖福魔咒の 提交于 2019-12-13 09:06:19
问题 I Have two functions like below def method1(ip:String,r:Double,op:String)={ val data = spark.read.option("header", true).csv(ip).toDF() val r3= data.select("c", "S").dropDuplicates("C", "S").withColumn("R", lit(r)) r3.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save(op) } def method2(ip:String,op:String)={ val data = spark.read.option("header", true).csv(ip).toDF() val r3= data.select("c", "S").dropDuplicates("C", "StockCode") r3.coalesce(1).write.format("com

Getting error when calling any function in SQL query in a package without declaring function in package specification

怎甘沉沦 提交于 2019-12-13 08:41:15
问题 I have created a function which return NUMBER type in a package, but not declare this function in the Package Specification. I am calling this function in SQL query in anther function with in same package body. I am getting error. When i declare function in Package Specification Then its fine & working. I want to know reason behind it. Please anybody explain it. 回答1: Nothing to do with forward declaration at all. This deals with the fact that you are using a SQL query to call the function .

Custom Sorting Function - of SQL Server

淺唱寂寞╮ 提交于 2019-12-13 07:00:44
问题 I have a column in SQL Server 2005 that stores a version number as a string that i would like to sort by. I have been unable to find out how to sort this column, although i am guessing it would be some kind of custom function or compare algorithm. Can anyone point me in the right direction of where to start? I may be googling the wrong stuff. Cheers Tris 回答1: I'd use separate int columns (e.g. MajorCol + MinorCol if you are tracking major + minor versions) and run something like order by