user-defined-functions

SQL massive performance difference using SELECT TOP x even when x is much higher than selected rows

自古美人都是妖i 提交于 2019-12-18 12:59:17
问题 I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query. SELECT col1, col2, col3 etc FROM dbo.some_table_function WHERE col1 = @parameter --ORDER BY col1 is taking upwards of 5 or 6 mins to complete. However SELECT TOP 6000 col1, col2, col3 etc FROM dbo.some_table_function WHERE col1 = @parameter --ORDER BY col1 completes in about 4 or 5 seconds. This wouldn't surprise me if the returned set of data

Applying function to Spark Dataframe Column

删除回忆录丶 提交于 2019-12-18 11:56:36
问题 Coming from R, I am used to easily doing operations on columns. Is there any easy way to take this function that I've written in scala def round_tenths_place( un_rounded:Double ) : Double = { val rounded = BigDecimal(un_rounded).setScale(1, BigDecimal.RoundingMode.HALF_UP).toDouble return rounded } And apply it to a one column of a dataframe - kind of what I hoped this would do: bid_results.withColumn("bid_price_bucket", round_tenths_place(bid_results("bid_price")) ) I haven't found any easy

Remove AddIn path from UDF in Excel formula

风格不统一 提交于 2019-12-18 08:59:05
问题 My addin was xla, now I use excelDNA, so it becomes xll, When I open spreadsheet built in previous version of My addin, for the UDF, it shows myUDF with path of xla. e.g "C:\Program Files\Installation folder\MyUDFs.xla!MyUDF", when I click Edit link and change source to "C:...\MyUDFs.xll" I got a pop up which says "Excel cannot update one or more links in this workbook. To update the links, open all the link source files(click Edit Links on the Data tab). To be sure all calculations are

Table Valued Function where did my query plan go?

随声附和 提交于 2019-12-18 04:25:29
问题 I've just wrapped a complex SQL Statement in a Table-valued function on SQLServer 2000. When looking at the Query Plan for a SELECT * FROM dbo.NewFunc it just gives me a Table Scan of the table I have created. I'm guessing that this is because table is created in tempdb and I am just selecting from it. So the query is simply : SELECT * FROM table in tempdb My questions are: Is the UDF using the same plan as the complex SQL statement? How can I tune indexes for this UDF? Can I see the true

User defined function to be applied to Window in PySpark?

杀马特。学长 韩版系。学妹 提交于 2019-12-18 04:17:10
问题 I am trying to apply a user defined function to Window in PySpark. I have read that UDAF might be the way to to go, but I was not able to find anything concrete. To give an example (taken from here: Xinh's Tech Blog and modified for PySpark): from pyspark import SparkConf from pyspark.sql import SparkSession from pyspark.sql.window import Window from pyspark.sql.functions import avg spark = SparkSession.builder.master("local").config(conf=SparkConf()).getOrCreate() a = spark.createDataFrame([

Can T-SQL function return user-defined table type? [duplicate]

為{幸葍}努か 提交于 2019-12-18 03:51:30
问题 This question already has answers here : SQL Server 2008: Can a multi-statement UDF return a UDT? [duplicate] (1 answer) SQL Server 2008 - How do i return a User-Defined Table Type from a Table-Valued Function? (4 answers) Closed 4 years ago . I have my own type: CREATE TYPE MyType AS TABLE ( foo INT ) and a function receiving it as a parameter: CREATE FUNCTION Test ( @in MyType READONLY ) RETURNS @return MyType AS ... can it return MyType or only TABLE repeating MyType's structure: CREATE

Using UDF ignores condition in when

淺唱寂寞╮ 提交于 2019-12-17 21:11:15
问题 Suppose you had the following pyspark DataFrame: data= [('foo',), ('123',), (None,), ('bar',)] df = sqlCtx.createDataFrame(data, ["col"]) df.show() #+----+ #| col| #+----+ #| foo| #| 123| #|null| #| bar| #+----+ The next two code blocks should do the same thing- that is, return the uppercase of the column if it is not null . However, the second method (using a udf ) produces an error. Method 1 : Using pyspark.sql.functions.upper() import pyspark.sql.functions as f df.withColumn( 'upper', f

Stack Overflow while processing several columns with a UDF

一笑奈何 提交于 2019-12-17 19:25:27
问题 I have a DataFrame with many columns of str type, and I want to apply a function to all those columns, without renaming their names or adding more columns, I tried using a for-in loop executing withColumn (see example bellow), but normally when I run the code, it shows a Stack Overflow (it rarely works), this DataFrame is not big at all, it has just ~15000 records. # df is a DataFrame def lowerCase(string): return string.strip().lower() lowerCaseUDF = udf(lowerCase, StringType()) for

Hive getting top n records in group by query

只谈情不闲聊 提交于 2019-12-17 17:32:11
问题 I have following table in hive user-id, user-name, user-address,clicks,impressions,page-id,page-name I need to find out top 5 users[user-id,user-name,user-address] by clicks for each page [page-id,page-name] I understand that we need to first group by [page-id,page-name] and within each group I want to orderby [clicks,impressions] desc and then emit only top 5 users[user-id, user-name, user-address] for each page but I am finding it difficult to construct the query. How can we do this using

Excel is calculating a formula with a VBA function as an error unless it is re-entered

浪尽此生 提交于 2019-12-17 15:58:09
问题 I've got a simple if statement set up in a worksheet where the if condition is VBA user defined function: Function CellIsFormula(ByRef rng) CellIsFormula = rng(1).HasFormula End Function This function seems to work fine: But for some reason that I can't figure out, the cell is evaluating to an error. What's worse, is when evaluating the formula, excel is attributing the error to a calculation step that doesn't produce an error: To top it all off, and what really blows my mind, is that if I