window-functions

PostgreSQL window function: partition by comparison

有些话、适合烂在心里 提交于 2019-12-17 18:44:24
问题 I'm trying to find the way of doing a comparison with the current row in the PARTITION BY clause in a WINDOW function in PostgreSQL query. Imagine I have the short list in the following query of this 5 elements (in the real case, I have thousands or even millions of rows). I am trying to get for each row, the id of the next different element (event column), and the id of the previous different element. WITH events AS( SELECT 1 as id, 12 as event, '2014-03-19 08:00:00'::timestamp as date UNION

Does Spark know the partitioning key of a DataFrame?

匆匆过客 提交于 2019-12-17 17:46:10
问题 I want to know if Spark knows the partitioning key of the parquet file and uses this information to avoid shuffles. Context: Running Spark 2.0.1 running local SparkSession. I have a csv dataset that I am saving as parquet file on my disk like so: val df0 = spark .read .format("csv") .option("header", true) .option("delimiter", ";") .option("inferSchema", false) .load("SomeFile.csv")) val df = df0.repartition(partitionExprs = col("numerocarte"), numPartitions = 42) df.write .mode(SaveMode

Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

旧时模样 提交于 2019-12-17 17:41:43
问题 I have found for several times the following guidelines for getting the power spectrum of an audio signal: collect N samples, where N is a power of 2 apply a suitable window function to the samples, e.g. Hanning pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts calculate the squared magnitude of your FFT output bins (re * re + im * im) (optional) calculate 10 *

Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

雨燕双飞 提交于 2019-12-17 17:41:15
问题 I have found for several times the following guidelines for getting the power spectrum of an audio signal: collect N samples, where N is a power of 2 apply a suitable window function to the samples, e.g. Hanning pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts calculate the squared magnitude of your FFT output bins (re * re + im * im) (optional) calculate 10 *

Spark - Window with recursion? - Conditionally propagating values across rows

余生颓废 提交于 2019-12-17 16:54:32
问题 I have the following dataframe showing the revenue of purchases. +-------+--------+-------+ |user_id|visit_id|revenue| +-------+--------+-------+ | 1| 1| 0| | 1| 2| 0| | 1| 3| 0| | 1| 4| 100| | 1| 5| 0| | 1| 6| 0| | 1| 7| 200| | 1| 8| 0| | 1| 9| 10| +-------+--------+-------+ Ultimately I want the new column purch_revenue to show the revenue generated by the purchase in every row. As a workaround, I have also tried to introduce a purchase identifier purch_id which is incremented each time a

How do I Handle Ties When Ranking Results in MySQL?

对着背影说爱祢 提交于 2019-12-17 16:38:08
问题 How does one handle ties when ranking results in a mysql query? I've simplified the table names and columns in this example, but it should illustrate my problem: SET @rank=0; SELECT student_names.students, @rank := @rank +1 AS rank, scores.grades FROM student_names LEFT JOIN scores ON student_names.students = scores.students ORDER BY scores.grades DESC So imagine the the above query produces: Students Rank Grades ======================= Al 1 90 Amy 2 90 George 3 78 Bob 4 73 Mary 5 NULL

Applying a Window function to calculate differences in pySpark

蓝咒 提交于 2019-12-17 15:42:44
问题 I am using pySpark , and have set up my dataframe with two columns representing a daily asset price as follows: ind = sc.parallelize(range(1,5)) prices = sc.parallelize([33.3,31.1,51.2,21.3]) data = ind.zip(prices) df = sqlCtx.createDataFrame(data,["day","price"]) I get upon applying df.show() : +---+-----+ |day|price| +---+-----+ | 1| 33.3| | 2| 31.1| | 3| 51.2| | 4| 21.3| +---+-----+ Which is fine and all. I would like to have another column that contains the day-to-day returns of the price

How to use a ring data structure in window functions

半城伤御伤魂 提交于 2019-12-17 14:01:28
问题 I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring: I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5. Here is an example table of two rings with different numbers of parts

Window Functions or Common Table Expressions: count previous rows within range

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-17 13:26:47
问题 I'd like to use a window function to determine, for each row, the total number of preceding records meeting a certain criteria. A specific example: clone=# \d test Table "pg_temp_2.test" Column | Type | Modifiers --------+-----------------------------+----------- id | bigint | date | timestamp without time zone | I'd like to know for each date the count of rows within '1 hour previous' to that date . Can I do this with a window function? Or do I need to investigate CTE's? I really want to be

Using windowing functions in Spark

不羁岁月 提交于 2019-12-17 09:57:31
问题 I am trying to use rowNumber in Spark data frames. My queries are working as expected in Spark shell. But when i write them out in eclipse and compile a jar, i am facing an error 16/03/23 05:52:43 ERROR ApplicationMaster: User class threw exception:org.apache.spark.sql.AnalysisException: Could not resolve window function 'row_number'. Note that, using window functions currently requires a HiveContext; org.apache.spark.sql.AnalysisException: Could not resolve window function 'row_number'. Note