user-defined-functions | 易学教程

DataFrame user-defined function not applied unless I change column name

阅读更多关于 DataFrame user-defined function not applied unless I change column name

问题 I want to convert my DataFrame column using implicits functions definition. I have my DataFrame type defined, which contains additional functions: class MyDF(df: DataFrame) { def bytes2String(colName: String): DataFrame = df .withColumn(colname + "_tmp", udf((x: Array[Byte]) => bytes2String(x)).apply(col(colname))) .drop(colname) .withColumnRenamed(colname + "_tmp", colname) } Then I define my implicit conversion class: object NpDataFrameImplicits { implicit def toNpDataFrame(df: DataFrame):

How to fetch records set with a ttl of -1 in aerospike?

阅读更多关于 How to fetch records set with a ttl of -1 in aerospike?

问题 I have so many records in aerospike, i want to fetch the records whose ttl is -1 please provide solution 回答1: Just to clarify, setting a TTL of -1 in the client means never expire (equivalent to a default-ttl of 0 in the server's aerospike.conf file), while setting a TTL of 0 in the client means inherit the default-ttl for this namespace . With Predicate Filtering: If you're using the Java, C, C# and Go clients the easiest way to identify the records with a void time of 0 would be to use a

PySpark - Adding a Column from a list of values using a UDF

阅读更多关于 PySpark - Adding a Column from a list of values using a UDF

问题 I have to add column to a PySpark dataframe based on a list of values. a= spark.createDataFrame([("Dog", "Cat"), ("Cat", "Dog"), ("Mouse", "Cat")],["Animal", "Enemy"]) I have a list called rating, which is a rating of each pet. rating = [5,4,1] I need to append the dataframe with a column called Rating, such that +------+-----+------+ |Animal|Enemy|Rating| +------+-----+------+ | Dog| Cat| 5| | Cat| Dog| 4| | Mouse| Cat| 1| +------+-----+------+ I have done the following however it is

Multi-part identifier could not be bound

阅读更多关于 Multi-part identifier could not be bound

问题 I know that there are several questions around this exception on SO, but nothing seen that helps me. I have following query giving me a "Multi-part identifier 'claim.fiData' could not be bound" -Exception: SELECT claim.idData FROM tabData as claim INNER JOIN dbo._previousClaimsByFiData(claim.fiData) AS prevClaim ON prevClaim.idData=claim.fiData GROUP BY claim.idData HAVING(prevClaim.fiMaxActionCode IN (8, 23, 24) and prevClaim.Repair_Completion_Date >= DATEADD(day,-90,prevClaim.Repair

Can Excel Conditional Formatting use UDFs in the condition?

阅读更多关于 Can Excel Conditional Formatting use UDFs in the condition?

问题 I have a cell in Excel that I want to format differently based on a user defined formula (UDF) - my formula tests whether there is a formula in the cell... I am trying to use conditional formatting with my UDF to format the cell - but it does not seem to be working. My condition is this: ="isManualPrice(R22C12)" I tried without the quotes, but get the error You cannot use references to other worksheets or workbooks for Conditional Formatting criteria Perhaps the issue relates to my UDF being

Disadvantages of using a lot of parameters

阅读更多关于 Disadvantages of using a lot of parameters

问题 I am re-writing some code to make functional changes and I am stuck at a situation where either I will need to overload a function to accommodate two or three types of parameters (but performing almost identical operations on them) OR use one function with a lot of parameters. Now I am going with the latter option, and I just wanted to know specific disadvantages (if any) of using a function with a lot of parameters (and when I say lot, I mean 15). I am looking for a general answer, nothing

Custom aggregation on PySpark dataframes

阅读更多关于 Custom aggregation on PySpark dataframes

问题 I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]] I want the output as row: ["1234", [ 1 1 0]] so the vector is a sum of all vectors grouped by userid . How can I achieve this? PySpark sum aggregate operation does not support the vector addition. 回答1: You have several options: Create a user defined aggregate

How call method based on Json Object scala spark?

阅读更多关于 How call method based on Json Object scala spark?

问题 I Have two functions like below def method1(ip:String,r:Double,op:String)={ val data = spark.read.option("header", true).csv(ip).toDF() val r3= data.select("c", "S").dropDuplicates("C", "S").withColumn("R", lit(r)) r3.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save(op) } def method2(ip:String,op:String)={ val data = spark.read.option("header", true).csv(ip).toDF() val r3= data.select("c", "S").dropDuplicates("C", "StockCode") r3.coalesce(1).write.format("com

Getting error when calling any function in SQL query in a package without declaring function in package specification

阅读更多关于 Getting error when calling any function in SQL query in a package without declaring function in package specification

问题 I have created a function which return NUMBER type in a package, but not declare this function in the Package Specification. I am calling this function in SQL query in anther function with in same package body. I am getting error. When i declare function in Package Specification Then its fine & working. I want to know reason behind it. Please anybody explain it. 回答1: Nothing to do with forward declaration at all. This deals with the fact that you are using a SQL query to call the function .

Custom Sorting Function - of SQL Server

阅读更多关于 Custom Sorting Function - of SQL Server

问题 I have a column in SQL Server 2005 that stores a version number as a string that i would like to sort by. I have been unable to find out how to sort this column, although i am guessing it would be some kind of custom function or compare algorithm. Can anyone point me in the right direction of where to start? I may be googling the wrong stuff. Cheers Tris 回答1: I'd use separate int columns (e.g. MajorCol + MinorCol if you are tracking major + minor versions) and run something like order by