group-by | 易学教程

Pandas: How to find percentage of group members type per subgroup?

阅读更多关于 Pandas: How to find percentage of group members type per subgroup?

问题 ( Data sample and attempts at the end of the question ) With a dataframe such as this: Type Class Area Decision 0 A 1 North Yes 1 B 1 North Yes 2 C 2 South No 3 A 3 South No 4 B 3 South No 5 C 1 South No 6 A 2 North Yes 7 B 3 South Yes 8 B 1 North No 9 C 1 East No 10 C 2 West Yes How can I find what percentage of each type [A, B, C, D] that belongs to each area [North, South, East, West] ? Desired output: North South East West A 0.66 0.33 0 0 B 0.5 0.5 0 0 C 0 0.5 0.25 0.25 My best attempt so

Group by numbers that are in sequence

阅读更多关于 Group by numbers that are in sequence

问题 I have some data like this: row id 1 1 2 36 3 37 4 38 5 50 6 51 I would like to query it to look like this: row id group 1 1 1 2 36 2 3 37 2 4 38 2 5 50 3 6 51 3 ... so that I can GROUP BY where the numbers are consecutively sequential. Also, looping/cursoring is out of the question since I'm working with a pretty large set of data, thanks. 回答1: create table #temp ( IDUnique int Identity(1,1), ID int, grp int ) Insert into #temp(ID) Values(1) Insert into #temp(ID) Values(36) Insert into #temp

Group by numbers that are in sequence

阅读更多关于 Group by numbers that are in sequence

MySQL group-by very slow

阅读更多关于 MySQL group-by very slow

问题 I have the folowwing SQL query SELECT CustomerID FROM sales WHERE `Date` <= '2012-01-01' GROUP BY CustomerID The query is executed over 11400000 rows and runs very slow. It takes over 3 minutes to execute. If I remove the group-by part, this runs below 1 second. Why is that? MySQL Server version is '5.0.21-community-nt' Here is the table schema: CREATE TABLE `sales` ( `ID` int(11) NOT NULL auto_increment, `DocNo` int(11) default '0', `CustomerID` int(11) default '0', `OperatorID` int(11)

group by cities

阅读更多关于 group by cities

问题 As I was advised by a good man and programmer I should simplify my table. So far I have made a new table (x-month,y-cities,value-Nettotal) it works, but still I didn't understand why it can't group the values (nettotal) by cities. It's OK with month, but the values just come starting from left to right without any 0 left behind. Anyway I hope you will understand everything from the source: here are the queries: <cfquery name="GET_SALES_TOTAL" datasource="#dsn#"> SELECT SUM(COALESCE(nettotal,0

group by cities

阅读更多关于 group by cities

python groupby and list interaction

阅读更多关于 python groupby and list interaction

问题 If we run the following code, from itertools import groupby s = '1223' r = groupby(s) x = list(r) a = [list(g) for k, g in r] print(a) b =[list(g) for k, g in groupby(s)] print(b) then surprisingly the two output lines are DIFFERENT: [] [['1'], ['2', '2'], ['3']] But if we remove the line "x=list(r)", then the two lines are the same, as expected. I don't understand why the list() function will change the groupby result. 回答1: The result of groupby , as with many objects in the itertools

R code to assign a sequence based off of multiple variables [duplicate]

阅读更多关于 R code to assign a sequence based off of multiple variables [duplicate]

问题 This question already has answers here : Recode dates to study day within subject (2 answers) Closed last month . I have data structured as below: ID Day Desired Output 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 1 3 3 2 4 1 2 4 1 2 5 2 3 6 1 3 6 1 Is it possible to create a sequence for the desired output without using a loop? The dataset is quite large so a loop won't work, is it possible to do this with the dplyr package or maybe a combination of cumsum/diff? 回答1: An option is to group by 'ID', and then

Python array of tuples group by first, store second

阅读更多关于 Python array of tuples group by first, store second

问题 So I have an array of tuples something like this query_results = [("foo", "bar"), ("foo", "qux"), ("baz", "foo")] I would like to achieve something like: { "foo": ["bar", "qux"], "baz": ["foo"] } So I have tried using this from itertools import groupby grouped_results = {} for key, y in groupby(query_results, lambda x: x[0]): grouped_results[key] = [y[1] for u in list(y)] The issue I have is although the number of keys are correct, the number of values in each array is dramatically lower than

Python array of tuples group by first, store second

阅读更多关于 Python array of tuples group by first, store second