multi-index | 易学教程

Pandas: get multiindex level as series

阅读更多关于 Pandas: get multiindex level as series

I have a dataframe with multiple levels, eg: idx = pd.MultiIndex.from_product((['foo', 'bar'], ['one', 'five', 'three' 'four']), names=['first', 'second']) df = pd.DataFrame({'A': [np.nan, 12, np.nan, 11, 16, 12, 11, np.nan]}, index=idx).dropna().astype(int) A first second foo five 12 four 11 bar one 16 five 12 three 11 I want to create a new column using the index level titled second , so that I get A B first second foo five 12 five four 11 four bar one 16 one five 12 five three 11 three I can do this by resetting the index, copying the column, then re-applying, but that seems more round

Multiindex and timezone - Frozen list error

阅读更多关于 Multiindex and timezone - Frozen list error

I try to change the timezone of a multiindex DataFramebut I get an frozen list error. Has someone any idea how to proceed ? >>> array = [('s001', d) for d in pd.date_range(start='01/01/2014', end='01/01/2015', freq='H')] + [('s002', d) for d in pd.date_range(start='01/01/2014', end='01/01/2015', freq='H')] >>> index = pd.MultiIndex.from_tuples(array, names=['sce', 'DATES']) >>> df = pd.DataFrame(np.random.randn(len(index)), index=index) >>> df.index.levels[1] = df.index.levels[1].tz_localize('Etc/GMT-1', ambiguous = 'NaT') Traceback (most recent call last): File "", line 1, in File "C:\Pythons

How to concatenate multiple csv to xarray and define coordinates?

阅读更多关于 How to concatenate multiple csv to xarray and define coordinates?

问题 I have multiple csv-files, with the same rows and columns and their contained data varies depending on the date. Each csv-file is affiliated with a different date, listed in its name, e.g. data.2018-06-01.csv . A minimal example of my data looks like that: I have the 2 files, data.2018-06-01.csv and data.2019-06-01.csv , that respectively contain user_id, weight, status 001, 70, healthy 002, 90, healthy and user_id, weight, status 001, 72, healthy 002, 103, obese My Question : How can I

Reshape MultiIndex dataframe to tabular format

阅读更多关于 Reshape MultiIndex dataframe to tabular format

Given a sample MultiIndex: idx = pd.MultiIndex.from_product([[0, 1, 2], ['a', 'b', 'c', 'd']]) df = pd.DataFrame({'value' : np.arange(12)}, index=idx) df value 0 a 0 b 1 c 2 d 3 1 a 4 b 5 c 6 d 7 2 a 8 b 9 c 10 d 11 How can I efficiently convert this to a tabular format like so? a b c d 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11 Furthermore, given the dataframe above, how can I bring it back to its original multi-indexed state? What I've tried: pd.DataFrame(df.values.reshape(-1, df.index.levels[1].size), index=df.index.levels[0], columns=df.index.levels[1]) Which works for the first problem, but I'm not

Pandas pivot table for multiple columns at once

阅读更多关于 Pandas pivot table for multiple columns at once

Let's say I have a DataFrame: nj ptype wd wpt 0 2 1 2 1 1 3 2 1 2 2 1 1 3 1 3 2 2 3 3 4 3 1 2 2 I would like to aggregate this data using ptype as the index like so: nj wd wpt 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 ptype 1 1 1 1 0 2 1 2 1 0 2 0 1 1 1 0 1 0 1 1 You could build each one of the top level columns for the final value by creating a pivot table with aggfunc='count' and then concatenating them all, like so: nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd'] wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd'] wd = df.pivot_table(index

Pandas: Modify a particular level of Multiindex

阅读更多关于 Pandas: Modify a particular level of Multiindex

I have a dataframe with Multiindex and would like to modify one particular level of the Multiindex. For instance, the first level might be strings and I may want to remove the white spaces from that index level: df.index.levels[1] = [x.replace(' ', '') for x in df.index.levels[1]] However, the code above results in an error: TypeError: 'FrozenList' does not support mutable operations. I know I can reset_index and modify the column and then re-create the Multiindex, but I wonder whether there is a more elegant way to modify one particular level of the Multiindex directly. Thanks to @cxrodgers's

Giving a column multiple indexes/headers

阅读更多关于 Giving a column multiple indexes/headers

I am working with pandas dataframes that are essentially time series like this: level Date 1976-01-01 409.67 1976-02-01 409.58 1976-03-01 409.66 … What I want to have, is multiple indexes/headers for the level column, like so: Station1 #Name of the datasource 43.1977317,-4.6473648,5 #Lat/Lon of the source Precip #Type of data Date 1976-01-01 409.67 1976-02-01 409.58 1976-03-01 409.66 … So essentially I am searching for something like Mydata.columns.level1 = ['Station1'] , Mydata.columns.level2 = [Lat,Lon] , Mydata.columns.level3 = ['Precip'] . Reason being that a single location can have

reading excel sheet as multiindex dataframe through pd.read_excel()

阅读更多关于 reading excel sheet as multiindex dataframe through pd.read_excel()

I'm struggle to read a excel sheet with pd.read_excel() . My excel table looks like this in it's raw form: I expected the dataframe to look like this: bar baz foo one two one two one two A B C D E F baz one 0.085930 -0.848468 0.911572 -0.705026 -1.284458 -0.602760 two 0.385054 2.539314 0.589164 0.765126 0.210199 -0.481789 three -0.352475 -0.975200 -0.403591 0.975707 0.533924 -0.195430 is this even possible? My failed attempt: xls_file = pd.read_excel(data_file, header=[0,1,2], index_col=None) Link to the raw excel file: https://www.dropbox.com/s/ek646ab4yb1fvdq/ipsos_excel_tables_type_2_trimed

Pandas : Proper way to set values based on condition for subset of multiindex dataframe

阅读更多关于 Pandas : Proper way to set values based on condition for subset of multiindex dataframe

I'm not sure of how to do this without chained assignments (which probably wouldn't work anyways because I'd be setting a copy). I wan't to take a subset of a multiindex pandas dataframe, test for values less than zero and set them to zero. For example: df = pd.DataFrame({('A','a'): [-1,-1,0,10,12], ('A','b'): [0,1,2,3,-1], ('B','a'): [-20,-10,0,10,20], ('B','b'): [-200,-100,0,100,200]}) df[df['A']<0] = 0.0 gives In [37]: df Out[37]: A B a b a b 0 -1 0 -20 -200 1 -1 1 -10 -100 2 0 2 0 0 3 10 3 10 100 4 12 -1 20 200 Which shows that it was not able to set based on the condition. Alternatively

Resampling a pandas dataframe with multi-index containing timeseries

阅读更多关于 Resampling a pandas dataframe with multi-index containing timeseries

问题 apologies from creating what appears to be a duplicate of this question. I have a dataframe that is shaped more or less like the one below: df_lenght = 240 df = pd.DataFrame(np.random.randn(df_lenght,2), columns=['a','b'] ) df['datetime'] = pd.date_range('23/06/2017', periods=df_lenght, freq='H') unique_jobs = ['job1','job2','job3',] job_id = [unique_jobs for i in range (1, int((df_lenght/len(unique_jobs))+1) ,1) ] df['job_id'] = sorted( [val for sublist in job_id for val in sublist] ) df.set