window-functions

SparkSQL - Lag function?

大城市里の小女人 提交于 2019-11-28 04:48:46
问题 I see in this DataBricks post, there is support for window functions in SparkSql, in particular I'm trying to use the lag() window function. I have rows of credit card transactions, and I've sorted them, now I want to iterate over the rows, and for each row display the amount of the transaction, and the difference of the current row's amount and the preceding row's amount. Following the DataBricks post, I've come up with this query, but it's throwing an exception at me and I can't quite

Why do I need to apply a window function to samples when building a power spectrum of an audio signal?

霸气de小男生 提交于 2019-11-28 04:44:39
I have found for several times the following guidelines for getting the power spectrum of an audio signal: collect N samples, where N is a power of 2 apply a suitable window function to the samples, e.g. Hanning pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts calculate the squared magnitude of your FFT output bins (re * re + im * im) (optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB Now that you have your power

Spark - Window with recursion? - Conditionally propagating values across rows

*爱你&永不变心* 提交于 2019-11-28 01:46:20
I have the following dataframe showing the revenue of purchases. +-------+--------+-------+ |user_id|visit_id|revenue| +-------+--------+-------+ | 1| 1| 0| | 1| 2| 0| | 1| 3| 0| | 1| 4| 100| | 1| 5| 0| | 1| 6| 0| | 1| 7| 200| | 1| 8| 0| | 1| 9| 10| +-------+--------+-------+ Ultimately I want the new column purch_revenue to show the revenue generated by the purchase in every row. As a workaround, I have also tried to introduce a purchase identifier purch_id which is incremented each time a purchase was made. So this is listed just as a reference. +-------+--------+-------+-------------+------

SparkR window function

拜拜、爱过 提交于 2019-11-28 00:34:54
I found from JIRA that 1.6 release of SparkR has implemented window functions including lag and rank , but over function is not implemented yet. How can I use window function like lag function without over in SparkR (not the SparkSQL way)? Can someone provide an example? Spark 2.0.0+ SparkR provides DSL wrappers with over , window.partitionBy / partitionBy , window.orderBy / orderBy and rowsBetween / rangeBeteen functions. Spark <= 1.6 Unfortunately it is not possible in 1.6.0. While some window functions, including lag , have been implemented SparkR doesn't support window definitions yet

How do I Handle Ties When Ranking Results in MySQL?

妖精的绣舞 提交于 2019-11-28 00:07:38
How does one handle ties when ranking results in a mysql query? I've simplified the table names and columns in this example, but it should illustrate my problem: SET @rank=0; SELECT student_names.students, @rank := @rank +1 AS rank, scores.grades FROM student_names LEFT JOIN scores ON student_names.students = scores.students ORDER BY scores.grades DESC So imagine the the above query produces: Students Rank Grades ======================= Al 1 90 Amy 2 90 George 3 78 Bob 4 73 Mary 5 NULL William 6 NULL Even though Al and Amy have the same grade, one is ranked higher than the other. Amy got

How to aggregate over rolling time window with groups in Spark

烂漫一生 提交于 2019-11-27 21:38:33
I have some data that I want to group by a certain column, then aggregate a series of fields based on a rolling time window from the group. Here is some example data: df = spark.createDataFrame([Row(date='2016-01-01', group_by='group1', get_avg=5, get_first=1), Row(date='2016-01-10', group_by='group1', get_avg=5, get_first=2), Row(date='2016-02-01', group_by='group2', get_avg=10, get_first=3), Row(date='2016-02-28', group_by='group2', get_avg=20, get_first=3), Row(date='2016-02-29', group_by='group2', get_avg=30, get_first=3), Row(date='2016-04-02', group_by='group2', get_avg=8, get_first=4)])

Is it possible to use user defined aggregates (clr) with window functions (over)?

戏子无情 提交于 2019-11-27 16:16:12
Is it possible to use user defined aggregates (clr) with window functions (over) ? Can't find the answer in the documentation: http://technet.microsoft.com/en-us/library/ms190678.aspx You're right that it's tricky to find anything in the documentation. But searching the Connect website, I managed to find this gem : Today, you can use CLR aggregates with OVER clause and PARTITION BY just like regular aggregate functions. Once we have support for window functions... Which was a response from Microsoft. However, searching on the Connect site was what I did whilst I was waiting for my aged machine

How to use a ring data structure in window functions

守給你的承諾、 提交于 2019-11-27 15:36:36
I have data that is arranged in a ring structure (or circular buffer ), that is it can be expressed as sequences that cycle: ...-1-2-3-4-5-1-2-3-.... See this picture to get an idea of a 5-part ring: I'd like to create a window query that can combine the lag and lead items into a three point array, but I can't figure it out. For example at part 1 of a 5-part ring, the lag/lead sequence is 5-1-2, or at part 4 is 3-4-5. Here is an example table of two rings with different numbers of parts (always more than three per ring): create table rp (ring int, part int); insert into rp(ring, part) values(1

MySql using correct syntax for the over clause

。_饼干妹妹 提交于 2019-11-27 15:31:55
What is the correct syntax to get the over clause to work in mysql? I would like to see the total sms's sent by each user without grouping it with the group by clause. SELECT username, count(sentSmsId) OVER (userId) FROM sentSmsTable, userTable WHERE userId = sentUserId; There is no OVER clause in MySQL that I know of, but here is a link that might assist you to accomplish the same results: http://explainextended.com/2009/03/10/analytic-functions-first_value-last_value-lead-lag/ Hope this helps. MySQL 8 has got the window functions ! Therefore, you can write your query in it like this: SELECT

PostgreSQL equivalent for TOP n WITH TIES: LIMIT “with ties”?

≯℡__Kan透↙ 提交于 2019-11-27 14:27:51
I'm looking for something similar this in SQL Server: SELECT TOP n WITH TIES FROM tablename I know about LIMIT in PostgreSQL, but does the equivalent of the above exist? I'm just curious as it would save an extra query each time for me. If I have a table Numbers with attribute nums : {10, 9, 8, 8, 2} . I want to do something like: SELECT nums FROM Numbers ORDER BY nums DESC LIMIT *with ties* 3 It should return {10, 9, 8, 8} because it takes the top 3 plus the extra 8 since it ties the other one. Erwin Brandstetter There is no WITH TIES clause in PostgreSQL like there is in SQL Server . In