grouping

Python Pandas sorting after groupby and aggregate

旧街凉风 提交于 2020-06-10 03:35:32
问题 I am trying to sort data (Pandas) after grouping and aggregating and I am stuck. My data: data = {'from_year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012], 'name': ['John', 'John1', 'John', 'John', 'John4', 'John', 'John1', 'John6'], 'out_days': [11, 8, 10, 15, 11, 6, 10, 4]} persons = pd.DataFrame(data, columns=["from_year", "name", "out_days"]) days_off_yearly = persons.groupby(["from_year", "name"]).agg({"out_days": [np.sum]}) print(days_off_yearly) After that I have my data sorted:

Python Pandas sorting after groupby and aggregate

一世执手 提交于 2020-06-10 03:34:56
问题 I am trying to sort data (Pandas) after grouping and aggregating and I am stuck. My data: data = {'from_year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012], 'name': ['John', 'John1', 'John', 'John', 'John4', 'John', 'John1', 'John6'], 'out_days': [11, 8, 10, 15, 11, 6, 10, 4]} persons = pd.DataFrame(data, columns=["from_year", "name", "out_days"]) days_off_yearly = persons.groupby(["from_year", "name"]).agg({"out_days": [np.sum]}) print(days_off_yearly) After that I have my data sorted:

generate id for each group with repeated and missing observations

≯℡__Kan透↙ 提交于 2020-06-09 03:02:50
问题 I have a dataset with individuals observed over several weeks. Some individuals have no observations in some weeks, and some have several observations during the same week. I need to create a weekly ID(id_week in the code) that would be individual-specific. If an individual have two or more observations in one week, id_week should be the same for both observations. If an individual have no observations in a given week, the observation in a next week should be consuequent from the last

Pandas assign group numbers for each time bin

安稳与你 提交于 2020-05-31 03:40:41
问题 I have a pandas dataframe that looks like below. Key Name Val1 Val2 Timestamp 101 A 10 1 01-10-2019 00:20:21 102 A 12 2 01-10-2019 00:20:21 103 B 10 1 01-10-2019 00:20:26 104 C 20 2 01-10-2019 14:40:45 105 B 21 3 02-10-2019 09:04:06 106 D 24 3 02-10-2019 09:04:12 107 A 24 3 02-10-2019 09:04:14 108 E 32 2 02-10-2019 09:04:20 109 A 10 1 02-10-2019 09:04:22 110 B 10 1 02-10-2019 10:40:49 Starting from the earliest timestamp, that is, '01-10-2019 00:20:21', I need to create time bins of 10

Sectioning different heading levels

人盡茶涼 提交于 2020-05-23 10:52:34
问题 The goal is to group elements starting with different heading levels into sections nested according to those levels. Problem is similar to XSLT: moving a grouping html elements into section levels. The difference here is that heading levels are not in strict order. To give a simplified example, I want to transform an input like <body> <p>0.1</p> <p>0.2</p> <h2>h2.1</h2> <h3>h3.1</h3> <p>3.1</p> <p>3.2</p> <h1>h1.1</h1> <p>1.1</p> <h3>h3.2</h3> <p>3a.1</p> <p>3a.2</p> </body> into this desired

Splitting data into chunks and iterating over each chunk in R

霸气de小男生 提交于 2020-05-17 14:42:58
问题 I have a dataframe structured like this: birthwt tobacco01 pscore pscoreblocks blocknumber 3425 0 0.18 (0.177, 0.187] 1 3527 1 0.15 (0.158, 0.168] 2 1638 1 0.34 (0.335, 0.345] 3 Explaining the data : The birthwt column is a continuous variable measuring birth weight in grams. The tobacco01 column contains values of 0 or 1. The pscore column contains probability values between 0 and 1. The pscoreblocks takes the pscore column and breaks it down into 100 equally sized blocks. The block number

Splitting data into chunks and iterating over each chunk in R

一笑奈何 提交于 2020-05-17 14:42:33
问题 I have a dataframe structured like this: birthwt tobacco01 pscore pscoreblocks blocknumber 3425 0 0.18 (0.177, 0.187] 1 3527 1 0.15 (0.158, 0.168] 2 1638 1 0.34 (0.335, 0.345] 3 Explaining the data : The birthwt column is a continuous variable measuring birth weight in grams. The tobacco01 column contains values of 0 or 1. The pscore column contains probability values between 0 and 1. The pscoreblocks takes the pscore column and breaks it down into 100 equally sized blocks. The block number

Sectioning different heading levels

南楼画角 提交于 2020-05-17 07:04:44
问题 The goal is to group elements starting with different heading levels into sections nested according to those levels. Problem is similar to XSLT: moving a grouping html elements into section levels. The difference here is that heading levels are not in strict order. To give a simplified example, I want to transform an input like <body> <p>0.1</p> <p>0.2</p> <h2>h2.1</h2> <h3>h3.1</h3> <p>3.1</p> <p>3.2</p> <h1>h1.1</h1> <p>1.1</p> <h3>h3.2</h3> <p>3a.1</p> <p>3a.2</p> </body> into this desired

Groupby of multiple columns and assigning values to each by considering start and end of each (Pandas)

岁酱吖の 提交于 2020-05-17 07:04:41
问题 I've got a datframe that looks like that df1 v w x y 4 0 1 a b 5 0 1 a a _________________ 6 0 2 a b _________________ 2 0 3 a b - - - - - - - - - 3 1 2 a b _________________ 15 1 3 a b 12 1 3 b b _________________ 13 1 1 a b - - - - - - - - - 15 3 1 a b 14 3 1 b a 8 3 1 a b 9 3 1 a a so df1 were grouped (lines) by v and w and merged with another df which contained x and y. I need a new column z which picks the right group out of x and y with the following conditions: in Every subgroup 'V'

How do I create a new object from grouping by result

三世轮回 提交于 2020-05-17 05:58:13
问题 For the example in Java 8 POJO objects filter pojo based on common multiple key combination and sum on one field After summing up, I need to create a new object of Sales type, having the totals ( sum result of group by ) Something like below { "month" : "Total", "year": "2000", "state" : "State1", "city" : "City1", "sales" : "15" } So i have created corresponding constructor in Sales and tried list.stream() .collect(groupingBy(Sale::getState, groupingBy(Sale::getCity, summingInt(Sale: