window-functions | 易学教程

PostgreSQL window function: partition by comparison

阅读更多关于 PostgreSQL window function: partition by comparison

问题 I'm trying to find the way of doing a comparison with the current row in the PARTITION BY clause in a WINDOW function in PostgreSQL query. Imagine I have the short list in the following query of this 5 elements (in the real case, I have thousands or even millions of rows). I am trying to get for each row, the id of the next different element (event column), and the id of the previous different element. WITH events AS( SELECT 1 as id, 12 as event, '2014-03-19 08:00:00'::timestamp as date UNION

Does Spark know the partitioning key of a DataFrame?

阅读更多关于 Does Spark know the partitioning key of a DataFrame?

问题 I want to know if Spark knows the partitioning key of the parquet file and uses this information to avoid shuffles. Context: Running Spark 2.0.1 running local SparkSession. I have a csv dataset that I am saving as parquet file on my disk like so: val df0 = spark .read .format("csv") .option("header", true) .option("delimiter", ";") .option("inferSchema", false) .load("SomeFile.csv")) val df = df0.repartition(partitionExprs = col("numerocarte"), numPartitions = 42) df.write .mode(SaveMode

Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

阅读更多关于 Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

问题 I have found for several times the following guidelines for getting the power spectrum of an audio signal: collect N samples, where N is a power of 2 apply a suitable window function to the samples, e.g. Hanning pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts calculate the squared magnitude of your FFT output bins (re * re + im * im) (optional) calculate 10 *

Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

阅读更多关于 Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

Spark - Window with recursion? - Conditionally propagating values across rows

阅读更多关于 Spark - Window with recursion? - Conditionally propagating values across rows

问题 I have the following dataframe showing the revenue of purchases. +-------+--------+-------+ |user_id|visit_id|revenue| +-------+--------+-------+ | 1| 1| 0| | 1| 2| 0| | 1| 3| 0| | 1| 4| 100| | 1| 5| 0| | 1| 6| 0| | 1| 7| 200| | 1| 8| 0| | 1| 9| 10| +-------+--------+-------+ Ultimately I want the new column purch_revenue to show the revenue generated by the purchase in every row. As a workaround, I have also tried to introduce a purchase identifier purch_id which is incremented each time a

How do I Handle Ties When Ranking Results in MySQL?

阅读更多关于 How do I Handle Ties When Ranking Results in MySQL?

问题 How does one handle ties when ranking results in a mysql query? I've simplified the table names and columns in this example, but it should illustrate my problem: SET @rank=0; SELECT student_names.students, @rank := @rank +1 AS rank, scores.grades FROM student_names LEFT JOIN scores ON student_names.students = scores.students ORDER BY scores.grades DESC So imagine the the above query produces: Students Rank Grades ======================= Al 1 90 Amy 2 90 George 3 78 Bob 4 73 Mary 5 NULL

Applying a Window function to calculate differences in pySpark

阅读更多关于 Applying a Window function to calculate differences in pySpark

问题 I am using pySpark , and have set up my dataframe with two columns representing a daily asset price as follows: ind = sc.parallelize(range(1,5)) prices = sc.parallelize([33.3,31.1,51.2,21.3]) data = ind.zip(prices) df = sqlCtx.createDataFrame(data,["day","price"]) I get upon applying df.show() : +---+-----+ |day|price| +---+-----+ | 1| 33.3| | 2| 31.1| | 3| 51.2| | 4| 21.3| +---+-----+ Which is fine and all. I would like to have another column that contains the day-to-day returns of the price

How to use a ring data structure in window functions

阅读更多关于 How to use a ring data structure in window functions

问题 I have data that is arranged in a ring structure (or circular buffer), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring: I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5. Here is an example table of two rings with different numbers of parts

Window Functions or Common Table Expressions: count previous rows within range

阅读更多关于 Window Functions or Common Table Expressions: count previous rows within range

问题 I'd like to use a window function to determine, for each row, the total number of preceding records meeting a certain criteria. A specific example: clone=# \d test Table "pg_temp_2.test" Column | Type | Modifiers --------+-----------------------------+----------- id | bigint | date | timestamp without time zone | I'd like to know for each date the count of rows within '1 hour previous' to that date . Can I do this with a window function? Or do I need to investigate CTE's? I really want to be

Using windowing functions in Spark

阅读更多关于 Using windowing functions in Spark

问题 I am trying to use rowNumber in Spark data frames. My queries are working as expected in Spark shell. But when i write them out in eclipse and compile a jar, i am facing an error 16/03/23 05:52:43 ERROR ApplicationMaster: User class threw exception:org.apache.spark.sql.AnalysisException: Could not resolve window function 'row_number'. Note that, using window functions currently requires a HiveContext; org.apache.spark.sql.AnalysisException: Could not resolve window function 'row_number'. Note