pandas-groupby | 易学教程

pandas - how to organised dataframe based on date and assign new values to column

阅读更多关于 pandas - how to organised dataframe based on date and assign new values to column

问题 I have a dataframe of a month excluding Saturday and Sunday, which was logged every 1 minute. v1 v2 2017-04-03 09:15:00 35.7 35.4 2017-04-03 09:16:00 28.7 28.5 ... ... ... 2017-04-03 16:29:00 81.7 81.5 2017-04-03 16:30:00 82.7 82.6 ... ... ... 2017-04-04 09:15:00 24.3 24.2 2017-04-04 09:16:00 25.6 25.5 ... ... ... 2017-04-04 16:29:00 67.0 67.2 2017-04-04 16:30:00 70.2 70.6 ... ... ... 2017-04-28 09:15:00 31.7 31.4 2017-04-28 09:16:00 31.5 31.0 ... ... ... 2017-04-28 16:29:00 33.2 33.5 2017-04

How to group a python data frame by multilevel rows?

阅读更多关于 How to group a python data frame by multilevel rows?

问题 I have the following multi-level data frame: Year 2016 2017 Quarter 3 4 1 2 Month Sep Oct Nov Dec Jan Feb Mar Apr May Jun A 0.16 0.95 0.92 0.45 0.30 0.35 0.95 0.88 0.18 0.10 B 0.88 0.67 0.07 0.70 0.74 0.33 0.77 0.21 0.81 0.85 C 0.79 0.56 0.13 0.19 0.94 0.23 0.72 0.62 0.66 0.93 I want to sum up over the quarters, so that the final result is as follows: Year 2016 2017 Quarter 3 4 1 2 A 0.16 2.32 1.60 1.16 B 0.88 1.44 1.85 1.86 C 0.79 0.89 1.89 2.21 I tried with the following formula: df= df

Columns and rows concatenation with a commun value in another column

阅读更多关于 Columns and rows concatenation with a commun value in another column

问题 In the below mentioned table, I want to concatenate the columns Tri_gram_sents and Value together and then all rows which has the same number in column sentence . Tri_gram_sents Value sentence (('<s>', '<s>'), 'ABC') 0.161681 1 (('<s>', 'ABC'), 'ABC') 0.472973 1 (('ABC', 'ABC'), 'ABC') 0.305732 1 (('ABC', 'ABC'), 'ABC') 0.005655 1 (('ABC', 'ABC'), '</s>') 0.434783 1 (('ABC', '</s>'), '</s>') 0.008547 1 (('<s>', '<s>'), 'DEF') 0.111111 2 (('<s>', 'DEF'), 'DEF') 0.039474 2 (('DEF', 'DEF'), 'DEF

Pandas python + format for values

阅读更多关于 Pandas python + format for values

问题 This is the code: import pandas as pd from pandas import Series, DataFrame import numpy as np import matplotlib.pyplot as plt df.head(3).style.format({'Budget': "€ {:,.0f}"}) Year Project Entity Participation Country Budget 0 2015 671650 - MMMAGIC - 5G FUNDACION IMDEA NETWORK* Participant Spain € 384,000 1 2015 671650 - MMMAGIC - 5G ROHDE & SCHWARZ GMBH* Participant Germany € 12,000 2 2015 671650 - MMMAGIC - 5G SAMSUNG ELECTRONICS (UK) LIMITED Coordinator UnitedKingdom € 997,500 datos1 = (df[

Value direction change in a pandas column after groupby

阅读更多关于 Value direction change in a pandas column after groupby

问题 I have a dataframe as below. Cycle Type Difference 2 2 0.001 2 2 -0.019 2 2 -0.023 2 2 -0.012 2 2 0.008 2 2 -0.003 2 2 0.005 2 2 -0.007 2 2 0.01 2 2 -0.008 2 2 -0.012 2 2 -0.015 2 2 -0.011 2 2 -0.021 3 2 -0.006 3 2 -0.026 3 2 -0.012 3 2 -0.011 3 2 0.001 3 2 -0.007 3 2 -0.005 3 2 0.002 3 2 -0.015 3 2 -0.015 3 2 -0.013 3 2 -0.009 3 2 -0.018 3 2 -0.015 3 2 -0.017 3 2 -0.004 3 2 -0.014 The values can be a few continuous negative & a few continuous positives. I want to add a column which has the

identify records that make up 90% of total

阅读更多关于 identify records that make up 90% of total

问题 I have a report that identifies key drivers of an overall number/trend. I would like to automate the functionality to be able to list/identify the underlying records based on a percentage of that number. For example if the net change for sales of widgets in the south(region) is -5,000.00, but there are positives and negatives- I would like to identify at least ~90% (-4,500.00) of all underlying drivers that make up that -5,000.00 total from largest to smallest. data region OfficeLocation

Create separate dataframes by Iterate a groupby object

阅读更多关于 Create separate dataframes by Iterate a groupby object

问题 Looking to loop through region column (4 regions) by using group by and then run a bunch of pivot tables which which I will stack on top of each other and create a total row. The pivot tables need to be run on all 4 regions i am grouping by. Once i have the pivots, i then need to stack those on top of each other. import pandas as pd import numpy as np df = pd.DataFrame({'Roll': ['Analyst','doctor','activist','lawyer','writer','manager'], 'Animal': ['cats','dogs','birds','pianos','elephant',

How to pivot a dataframe

阅读更多关于 How to pivot a dataframe

问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting.... ... But I'm going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble

Pandas - convert cumulative value to actual value

阅读更多关于 Pandas - convert cumulative value to actual value

问题 Let's say my dataframe looks something like this: date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0 2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0 2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0 2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0 2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048

Seaborn Heatmap with Datetime Axes

阅读更多关于 Seaborn Heatmap with Datetime Axes

问题 I and to create a heatmap that will have year across the x axis and month across the y axis. In the heatmap will be % returns. Here's kinda what I am after. So I have some data and I turn them into pct_change() series. import pandas_datareader.data as web import pandas as pd from datetime import datetime as dt import numpy as np import seaborn as sns start = dt(year = 2000, month = 1, day = 1) df = web.DataReader('GDP', 'fred', start = '2000') df.pct_change() df.tail() So here's what we are