group-by

Return top N largest values per group using pandas

拜拜、爱过 提交于 2019-12-13 03:42:31
问题 I am trying to find maximum values in number of series each grouped by name of the column it was extracted from. I have a dataframe as such: MASTER SLAVE Value Master_1 Slave_1 657879 Master_1 Slave_2 34343 Master_1 Slave_3 453313 Master_2 Slave_1 56667 Master_2 Slave_2 6879 Master_2 Slave_3 12333 Master_2 Slave_4 789 Master_2 Slave_5 22235 Master_3 Slave_1 65765 Master_3 Slave_2 23431 Master_3 Slave_3 445 Master_3 Slave_4 567 I need to find maximum values of first two slaves of each master.

BigQuery - equivalent of GROUP EACH in standard SQL

折月煮酒 提交于 2019-12-13 03:39:26
问题 Is there an equivalent of GROUP EACH / JOIN EACH in standard SQL ? I'm exceeding my resources. 回答1: Nope. :o( There is no such equivalent in Standard SQL. ... EACH was a hint for BQ Engine (Legacy SQL) to more optimally process respective command - which is already covered in Standard SQL w/o any hint'ing Your option is to tune/optimize your query 来源: https://stackoverflow.com/questions/50769877/bigquery-equivalent-of-group-each-in-standard-sql

need database hourly data before two months

那年仲夏 提交于 2019-12-13 03:33:01
问题 I have a table with cpuload,freememory,diskspace, cpu utilization and hostname feilds. i am running a cron job to get the data for every 10 mins in all hosts(ex: 4 hosts). Now i have one year data in database. i want to convert that data into hourly average data. but i need only for the data which is before two months. last two months data should not disturb. my data is like this hostname | cpuload | freedisk | freemem |timestamp localhost.localdomain | 0.15 | 136052 | 383660 | 2017-08-01 00

How to grep a group based on string in another column that doesn't occur in each observation using R?

痞子三分冷 提交于 2019-12-13 03:27:02
问题 Have to simplify a previous question that failed. I want to extract whole groups, identified by 'id', that contain a string ('inter' or 'high') in another column called 'strmatch'. The string doesn't occurr in every observation of the group, but if it occurs I want to assign the group to a respective data frame. The data frame df <- data.frame(id = c("a", "a", "b", "b","c", "c","d","d"), std = c("y", "y","n","n","y","y","n","n"), strmatch = c("alpha","TMB-inter","beta","TMB-high","gamma",

How to pivot a dataframe

北慕城南 提交于 2019-12-13 03:23:40
问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting.... ... But I'm going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble

Strongly Typing a LINQ Query using multiple keys of complex objects

时光怂恿深爱的人放手 提交于 2019-12-13 03:23:11
问题 I can't figure out how to define the object used in my LINQ grouping query to allow them to be strongly type. I've build a grouping query that uses two complex objects as keys. The query works, but I would like to be able to declare the return object type. I have a complex type... Public Class Student Public Name As IndividualsName Public EnrolledSchool As School Public BirthHospital As BirthPlace Public Grade As Integer Public Sub New(ByVal name As IndividualsName, ByVal enrolledSchool As

Pandas - convert cumulative value to actual value

五迷三道 提交于 2019-12-13 03:19:47
问题 Let's say my dataframe looks something like this: date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0 2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0 2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0 2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0 2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048

Counting number of rows grouped by date and hour

半世苍凉 提交于 2019-12-13 02:57:36
问题 I am tracking customer store entry data in Microsoft SQL Server 2008 R2 that looks something like this: DoorID DateTimeStamp EntryType 1 2013-09-02 09:01:16.000 IN 1 2013-09-02 09:04:09.000 IN 1 2013-09-02 10:19:29.000 IN 1 2013-09-02 10:19:30.000 IN 1 2013-09-02 10:19:32.000 OUT 1 2013-09-02 10:26:36.000 IN 1 2013-09-02 10:26:40.000 OUT I don't want to count the OUT rows, just IN . I believe that it needs to be grouped on Date , and DoorID , then get the hours totals. I would like it to come

Pandas count elements in a columns and show in duplicated way

萝らか妹 提交于 2019-12-13 02:52:03
问题 I want to get something like this. A 1 1 2 3 3 4 4 4 4 I want to make it to be A B 1 2 1 2 2 1 3 2 3 2 4 4 4 4 4 4 4 4 Like you see here, the keys are duplicated and still in the same order as original. I know how to do this task in R by using data.table and I only know how to use groupby to get unique key counts in pandas. Anyone have ideas? Thank you! 回答1: You can use this: import pandas as pd df = pd.DataFrame({ 'A' : [1, 1, 2, 3, 3, 4, 4, 4, 4] }) df['B'] = df.groupby(['A'])['A']

SQL: insert rows with summarized values

断了今生、忘了曾经 提交于 2019-12-13 02:49:40
问题 please see my first question to my topic: SQL: partition over two columns I have following table: ---------------------------------- | No1 | No2 | Amount| Timestamp ---------------------------------- | A | B | 10 | 01.01.2018 | C | D | 20 | 02.01.2018 | B | A | 30 | 03.01.2018 | D | C | 40 | 04.01.2018 ---------------------------------- I have the following results at the moment: ----------------------------------------------------- | No1 | No2 | Sum(Amount) over partition | Timestamp -------