user-defined-functions

How to reference a dataframe when in an UDF on another dataframe?

泪湿孤枕 提交于 2019-12-21 18:10:07
问题 How do you reference a pyspark dataframe when in the execution of an UDF on another dataframe? Here's a dummy example. I am creating two dataframes scores and lastnames , and within each lies a column that is the same across the two dataframes. In the UDF applied on scores , I want to filter on lastnames and return a string found in lastname . from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import SQLContext from pyspark.sql.types import * sc = SparkContext(

Merge two spark sql columns of type Array[string] into a new Array[string] column

馋奶兔 提交于 2019-12-21 14:58:54
问题 I have two columns in a Spark SQL DataFrame with each entry in either column as an array of strings. val ngramDataFrame = Seq( (Seq("curious", "bought", "20"), Seq("iwa", "was", "asj")) ).toDF("filtered_words", "ngrams_array") I want to merge the arrays in each row to make a single array in a new column. My code is as follows: def concat_array(firstarray: Array[String], secondarray: Array[String]) : Array[String] = { (firstarray ++ secondarray).toArray } val concatUDF = udf(concat_array _)

Can an inline table-valued UDF outperform the equivalent scalar UDF in a SELECT column list?

我与影子孤独终老i 提交于 2019-12-21 05:45:07
问题 This question grew out of SQLServer: Why avoid Table-Valued User Defined Functions?. I began asking questions in some of the comments, and the replies to my comments moved off topic. So that you don't have to read the entire discussion: I had never heard it said that user defined functions (UDF) were slow, or to be avoided. Some links were posted in the question referenced above to illustrate that they were slow. I still didn't get it, and asked for an example. An example was posted, and the

Using a table-value function inside a view in SQL Server

拈花ヽ惹草 提交于 2019-12-21 05:21:57
问题 I have a table-value function that works correctly if I try the following query: SELECT * FROM dbo.GetScheduleForEmployee() AS schedule However if I try to create a view with that query I get a "too few parameters" error. Is there a limitation with table-value functions and views? 回答1: This works for me: CREATE FUNCTION dbo.GetScheduleForEmployee() RETURNS TABLE AS RETURN ( SELECT 1 AS id UNION ALL SELECT 2 ) GO CREATE VIEW myview AS SELECT * FROM GetScheduleForEmployee() AS schedule GO

User-Defined Functions SQL Server 2005 flagged incorrectly as non-deterministic?

 ̄綄美尐妖づ 提交于 2019-12-21 05:21:50
问题 Related to this question, I decided to check the UDFs in my data warehouse (which should largely have been deterministic), and I found several which aren't which should be. For instance: CREATE FUNCTION [udf_YearFromDataDtID] ( @DATA_DT_ID int ) RETURNS int AS BEGIN RETURN @DATA_DT_ID / 10000 END Shows up in this query: SELECT ROUTINE_NAME FROM INFORMATION_SCHEMA.ROUTINES WHERE IS_DETERMINISTIC = 'NO' AND ROUTINE_TYPE = 'FUNCTION' ORDER BY ROUTINE_NAME Why is this? 回答1: Yikes - apparently, it

Convert two columns into key-value json object?

安稳与你 提交于 2019-12-21 04:19:23
问题 Using FOR JSON AUTO or FOR JSON PATH on the following record set (which representing a product's attributes): attribute | value ----------------- color | red size | small will produce: [{"attribute":"color","value":"red"},{"attribute":"size","value":"small"}] Is there any way to produce the following Instead: {"color":"red","size":"small"} Note that as every product attribute is different than others; so this record set is different for every product. PIVOTing is not an option as it needs

Getting the result columns of table valued functions in SQL Server 2008 R2

倾然丶 夕夏残阳落幕 提交于 2019-12-21 04:12:30
问题 For a constants generator I like to get the meta data of result columns for all my table valued functions (what are the names of the columns returned by each table valued function). How can I get them? Do I have to parse the function's source code or is there an interface providing this information? Thanks for your help Chris The following query I use to get the TVFs: SELECT udf.name AS Name, SCHEMA_NAME(udf.schema_id) AS [Schema] FROM master.sys.databases AS dtb, sys.all_objects AS udf WHERE

How to pass parameters to Table Valued Function

寵の児 提交于 2019-12-20 16:48:07
问题 I want to do something like select * from tvfHello(@param) where @param in (Select ID from Users) 回答1: You need to use CROSS APPLY to achieve this select f.* from users u cross apply dbo.tvfHello(u.ID) f 回答2: The following works in the AdventureWorks database: CREATE FUNCTION dbo.EmployeeByID(@employeeID int) RETURNS TABLE AS RETURN ( SELECT * FROM HumanResources.Employee WHERE EmployeeID = @employeeID ) GO DECLARE @employeeId int set @employeeId=10 select * from EmployeeById(@employeeId)

Bizarre performance issue: Common Table Expressions in inline User-Defined Function

亡梦爱人 提交于 2019-12-20 10:34:53
问题 Here's a brain-twister for the SQL guys - can anyone think of a reason why the first of these functions performs fine, and the second one runs dog-slow? Function A - Typically finishes in ~5 ms CREATE FUNCTION dbo.GoodFunction ( @IDs UniqueIntTable READONLY ) RETURNS TABLE AS RETURN SELECT p.ID, p.Node, p.Name, p.Level FROM ( SELECT DISTINCT a.Ancestor AS Node FROM Hierarchy h CROSS APPLY dbo.GetAncestors(h.Node.GetAncestor(1)) a WHERE h.ID IN (SELECT Value FROM @IDs) ) np INNER JOIN

Difference between operator and function in C++?

北城以北 提交于 2019-12-20 09:38:00
问题 I could use some help understanding the following in C++, particularly the difference between an operator and a function: What is an operator? What is a function? What is the difference between them? Is a user-defined operator+() a function or an operator ? Can an operator operate on operands at compile-time? Do they always operate at compile time? (like sizeof() in C++) 回答1: An operator is a symbol like + , - , += and so forth (see 13.5). They don't carry a meaning. During semantic analysis,