group-by

Pandas: How to find percentage of group members type per subgroup?

六眼飞鱼酱① 提交于 2020-02-03 05:48:07
问题 ( Data sample and attempts at the end of the question ) With a dataframe such as this: Type Class Area Decision 0 A 1 North Yes 1 B 1 North Yes 2 C 2 South No 3 A 3 South No 4 B 3 South No 5 C 1 South No 6 A 2 North Yes 7 B 3 South Yes 8 B 1 North No 9 C 1 East No 10 C 2 West Yes How can I find what percentage of each type [A, B, C, D] that belongs to each area [North, South, East, West] ? Desired output: North South East West A 0.66 0.33 0 0 B 0.5 0.5 0 0 C 0 0.5 0.25 0.25 My best attempt so

Group by numbers that are in sequence

隐身守侯 提交于 2020-02-01 05:34:29
问题 I have some data like this: row id 1 1 2 36 3 37 4 38 5 50 6 51 I would like to query it to look like this: row id group 1 1 1 2 36 2 3 37 2 4 38 2 5 50 3 6 51 3 ... so that I can GROUP BY where the numbers are consecutively sequential. Also, looping/cursoring is out of the question since I'm working with a pretty large set of data, thanks. 回答1: create table #temp ( IDUnique int Identity(1,1), ID int, grp int ) Insert into #temp(ID) Values(1) Insert into #temp(ID) Values(36) Insert into #temp

Group by numbers that are in sequence

旧街凉风 提交于 2020-02-01 05:34:26
问题 I have some data like this: row id 1 1 2 36 3 37 4 38 5 50 6 51 I would like to query it to look like this: row id group 1 1 1 2 36 2 3 37 2 4 38 2 5 50 3 6 51 3 ... so that I can GROUP BY where the numbers are consecutively sequential. Also, looping/cursoring is out of the question since I'm working with a pretty large set of data, thanks. 回答1: create table #temp ( IDUnique int Identity(1,1), ID int, grp int ) Insert into #temp(ID) Values(1) Insert into #temp(ID) Values(36) Insert into #temp

MySQL group-by very slow

随声附和 提交于 2020-01-31 04:24:07
问题 I have the folowwing SQL query SELECT CustomerID FROM sales WHERE `Date` <= '2012-01-01' GROUP BY CustomerID The query is executed over 11400000 rows and runs very slow. It takes over 3 minutes to execute. If I remove the group-by part, this runs below 1 second. Why is that? MySQL Server version is '5.0.21-community-nt' Here is the table schema: CREATE TABLE `sales` ( `ID` int(11) NOT NULL auto_increment, `DocNo` int(11) default '0', `CustomerID` int(11) default '0', `OperatorID` int(11)

group by cities

梦想的初衷 提交于 2020-01-30 11:58:06
问题 As I was advised by a good man and programmer I should simplify my table. So far I have made a new table (x-month,y-cities,value-Nettotal) it works, but still I didn't understand why it can't group the values (nettotal) by cities. It's OK with month, but the values just come starting from left to right without any 0 left behind. Anyway I hope you will understand everything from the source: here are the queries: <cfquery name="GET_SALES_TOTAL" datasource="#dsn#"> SELECT SUM(COALESCE(nettotal,0

group by cities

被刻印的时光 ゝ 提交于 2020-01-30 11:57:53
问题 As I was advised by a good man and programmer I should simplify my table. So far I have made a new table (x-month,y-cities,value-Nettotal) it works, but still I didn't understand why it can't group the values (nettotal) by cities. It's OK with month, but the values just come starting from left to right without any 0 left behind. Anyway I hope you will understand everything from the source: here are the queries: <cfquery name="GET_SALES_TOTAL" datasource="#dsn#"> SELECT SUM(COALESCE(nettotal,0

python groupby and list interaction

拜拜、爱过 提交于 2020-01-30 11:47:47
问题 If we run the following code, from itertools import groupby s = '1223' r = groupby(s) x = list(r) a = [list(g) for k, g in r] print(a) b =[list(g) for k, g in groupby(s)] print(b) then surprisingly the two output lines are DIFFERENT: [] [['1'], ['2', '2'], ['3']] But if we remove the line "x=list(r)", then the two lines are the same, as expected. I don't understand why the list() function will change the groupby result. 回答1: The result of groupby , as with many objects in the itertools

R code to assign a sequence based off of multiple variables [duplicate]

对着背影说爱祢 提交于 2020-01-30 08:09:32
问题 This question already has answers here : Recode dates to study day within subject (2 answers) Closed last month . I have data structured as below: ID Day Desired Output 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 1 3 3 2 4 1 2 4 1 2 5 2 3 6 1 3 6 1 Is it possible to create a sequence for the desired output without using a loop? The dataset is quite large so a loop won't work, is it possible to do this with the dplyr package or maybe a combination of cumsum/diff? 回答1: An option is to group by 'ID', and then

Python array of tuples group by first, store second

拜拜、爱过 提交于 2020-01-30 06:23:10
问题 So I have an array of tuples something like this query_results = [("foo", "bar"), ("foo", "qux"), ("baz", "foo")] I would like to achieve something like: { "foo": ["bar", "qux"], "baz": ["foo"] } So I have tried using this from itertools import groupby grouped_results = {} for key, y in groupby(query_results, lambda x: x[0]): grouped_results[key] = [y[1] for u in list(y)] The issue I have is although the number of keys are correct, the number of values in each array is dramatically lower than

Python array of tuples group by first, store second

坚强是说给别人听的谎言 提交于 2020-01-30 06:21:08
问题 So I have an array of tuples something like this query_results = [("foo", "bar"), ("foo", "qux"), ("baz", "foo")] I would like to achieve something like: { "foo": ["bar", "qux"], "baz": ["foo"] } So I have tried using this from itertools import groupby grouped_results = {} for key, y in groupby(query_results, lambda x: x[0]): grouped_results[key] = [y[1] for u in list(y)] The issue I have is although the number of keys are correct, the number of values in each array is dramatically lower than