user-defined-functions | 易学教程

SQL massive performance difference using SELECT TOP x even when x is much higher than selected rows

阅读更多关于 SQL massive performance difference using SELECT TOP x even when x is much higher than selected rows

问题 I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query. SELECT col1, col2, col3 etc FROM dbo.some_table_function WHERE col1 = @parameter --ORDER BY col1 is taking upwards of 5 or 6 mins to complete. However SELECT TOP 6000 col1, col2, col3 etc FROM dbo.some_table_function WHERE col1 = @parameter --ORDER BY col1 completes in about 4 or 5 seconds. This wouldn't surprise me if the returned set of data

Applying function to Spark Dataframe Column

阅读更多关于 Applying function to Spark Dataframe Column

问题 Coming from R, I am used to easily doing operations on columns. Is there any easy way to take this function that I've written in scala def round_tenths_place( un_rounded:Double ) : Double = { val rounded = BigDecimal(un_rounded).setScale(1, BigDecimal.RoundingMode.HALF_UP).toDouble return rounded } And apply it to a one column of a dataframe - kind of what I hoped this would do: bid_results.withColumn("bid_price_bucket", round_tenths_place(bid_results("bid_price")) ) I haven't found any easy

Remove AddIn path from UDF in Excel formula

阅读更多关于 Remove AddIn path from UDF in Excel formula

问题 My addin was xla, now I use excelDNA, so it becomes xll, When I open spreadsheet built in previous version of My addin, for the UDF, it shows myUDF with path of xla. e.g "C:\Program Files\Installation folder\MyUDFs.xla!MyUDF", when I click Edit link and change source to "C:...\MyUDFs.xll" I got a pop up which says "Excel cannot update one or more links in this workbook. To update the links, open all the link source files(click Edit Links on the Data tab). To be sure all calculations are

Table Valued Function where did my query plan go?

阅读更多关于 Table Valued Function where did my query plan go?

问题 I've just wrapped a complex SQL Statement in a Table-valued function on SQLServer 2000. When looking at the Query Plan for a SELECT * FROM dbo.NewFunc it just gives me a Table Scan of the table I have created. I'm guessing that this is because table is created in tempdb and I am just selecting from it. So the query is simply : SELECT * FROM table in tempdb My questions are: Is the UDF using the same plan as the complex SQL statement? How can I tune indexes for this UDF? Can I see the true

User defined function to be applied to Window in PySpark?

阅读更多关于 User defined function to be applied to Window in PySpark?

问题 I am trying to apply a user defined function to Window in PySpark. I have read that UDAF might be the way to to go, but I was not able to find anything concrete. To give an example (taken from here: Xinh's Tech Blog and modified for PySpark): from pyspark import SparkConf from pyspark.sql import SparkSession from pyspark.sql.window import Window from pyspark.sql.functions import avg spark = SparkSession.builder.master("local").config(conf=SparkConf()).getOrCreate() a = spark.createDataFrame([

Can T-SQL function return user-defined table type? [duplicate]

阅读更多关于 Can T-SQL function return user-defined table type? [duplicate]

问题 This question already has answers here : SQL Server 2008: Can a multi-statement UDF return a UDT? [duplicate] (1 answer) SQL Server 2008 - How do i return a User-Defined Table Type from a Table-Valued Function? (4 answers) Closed 4 years ago . I have my own type: CREATE TYPE MyType AS TABLE ( foo INT ) and a function receiving it as a parameter: CREATE FUNCTION Test ( @in MyType READONLY ) RETURNS @return MyType AS ... can it return MyType or only TABLE repeating MyType's structure: CREATE

Using UDF ignores condition in when

阅读更多关于 Using UDF ignores condition in when

问题 Suppose you had the following pyspark DataFrame: data= [('foo',), ('123',), (None,), ('bar',)] df = sqlCtx.createDataFrame(data, ["col"]) df.show() #+----+ #| col| #+----+ #| foo| #| 123| #|null| #| bar| #+----+ The next two code blocks should do the same thing- that is, return the uppercase of the column if it is not null . However, the second method (using a udf ) produces an error. Method 1 : Using pyspark.sql.functions.upper() import pyspark.sql.functions as f df.withColumn( 'upper', f

Stack Overflow while processing several columns with a UDF

阅读更多关于 Stack Overflow while processing several columns with a UDF

问题 I have a DataFrame with many columns of str type, and I want to apply a function to all those columns, without renaming their names or adding more columns, I tried using a for-in loop executing withColumn (see example bellow), but normally when I run the code, it shows a Stack Overflow (it rarely works), this DataFrame is not big at all, it has just ~15000 records. # df is a DataFrame def lowerCase(string): return string.strip().lower() lowerCaseUDF = udf(lowerCase, StringType()) for

Hive getting top n records in group by query

阅读更多关于 Hive getting top n records in group by query

问题 I have following table in hive user-id, user-name, user-address,clicks,impressions,page-id,page-name I need to find out top 5 users[user-id,user-name,user-address] by clicks for each page [page-id,page-name] I understand that we need to first group by [page-id,page-name] and within each group I want to orderby [clicks,impressions] desc and then emit only top 5 users[user-id, user-name, user-address] for each page but I am finding it difficult to construct the query. How can we do this using

Excel is calculating a formula with a VBA function as an error unless it is re-entered

阅读更多关于 Excel is calculating a formula with a VBA function as an error unless it is re-entered

问题 I've got a simple if statement set up in a worksheet where the if condition is VBA user defined function: Function CellIsFormula(ByRef rng) CellIsFormula = rng(1).HasFormula End Function This function seems to work fine: But for some reason that I can't figure out, the cell is evaluating to an error. What's worse, is when evaluating the formula, excel is attributing the error to a calculation step that doesn't produce an error: To top it all off, and what really blows my mind, is that if I