multi-index

Pandas: get multiindex level as series

大憨熊 提交于 2019-11-28 13:23:35
I have a dataframe with multiple levels, eg: idx = pd.MultiIndex.from_product((['foo', 'bar'], ['one', 'five', 'three' 'four']), names=['first', 'second']) df = pd.DataFrame({'A': [np.nan, 12, np.nan, 11, 16, 12, 11, np.nan]}, index=idx).dropna().astype(int) A first second foo five 12 four 11 bar one 16 five 12 three 11 I want to create a new column using the index level titled second , so that I get A B first second foo five 12 five four 11 four bar one 16 one five 12 five three 11 three I can do this by resetting the index, copying the column, then re-applying, but that seems more round

Multiindex and timezone - Frozen list error

心不动则不痛 提交于 2019-11-28 12:49:01
I try to change the timezone of a multiindex DataFramebut I get an frozen list error. Has someone any idea how to proceed ? >>> array = [('s001', d) for d in pd.date_range(start='01/01/2014', end='01/01/2015', freq='H')] + [('s002', d) for d in pd.date_range(start='01/01/2014', end='01/01/2015', freq='H')] >>> index = pd.MultiIndex.from_tuples(array, names=['sce', 'DATES']) >>> df = pd.DataFrame(np.random.randn(len(index)), index=index) >>> df.index.levels[1] = df.index.levels[1].tz_localize('Etc/GMT-1', ambiguous = 'NaT') Traceback (most recent call last): File "", line 1, in File "C:\Pythons

How to concatenate multiple csv to xarray and define coordinates?

℡╲_俬逩灬. 提交于 2019-11-28 12:01:06
问题 I have multiple csv-files, with the same rows and columns and their contained data varies depending on the date. Each csv-file is affiliated with a different date, listed in its name, e.g. data.2018-06-01.csv . A minimal example of my data looks like that: I have the 2 files, data.2018-06-01.csv and data.2019-06-01.csv , that respectively contain user_id, weight, status 001, 70, healthy 002, 90, healthy and user_id, weight, status 001, 72, healthy 002, 103, obese My Question : How can I

Reshape MultiIndex dataframe to tabular format

风格不统一 提交于 2019-11-28 10:36:55
Given a sample MultiIndex: idx = pd.MultiIndex.from_product([[0, 1, 2], ['a', 'b', 'c', 'd']]) df = pd.DataFrame({'value' : np.arange(12)}, index=idx) df value 0 a 0 b 1 c 2 d 3 1 a 4 b 5 c 6 d 7 2 a 8 b 9 c 10 d 11 How can I efficiently convert this to a tabular format like so? a b c d 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11 Furthermore, given the dataframe above, how can I bring it back to its original multi-indexed state? What I've tried: pd.DataFrame(df.values.reshape(-1, df.index.levels[1].size), index=df.index.levels[0], columns=df.index.levels[1]) Which works for the first problem, but I'm not

Pandas pivot table for multiple columns at once

家住魔仙堡 提交于 2019-11-28 09:33:38
Let's say I have a DataFrame: nj ptype wd wpt 0 2 1 2 1 1 3 2 1 2 2 1 1 3 1 3 2 2 3 3 4 3 1 2 2 I would like to aggregate this data using ptype as the index like so: nj wd wpt 1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0 ptype 1 1 1 1 0 2 1 2 1 0 2 0 1 1 1 0 1 0 1 1 You could build each one of the top level columns for the final value by creating a pivot table with aggfunc='count' and then concatenating them all, like so: nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd'] wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd'] wd = df.pivot_table(index

Pandas: Modify a particular level of Multiindex

泄露秘密 提交于 2019-11-28 07:27:32
I have a dataframe with Multiindex and would like to modify one particular level of the Multiindex. For instance, the first level might be strings and I may want to remove the white spaces from that index level: df.index.levels[1] = [x.replace(' ', '') for x in df.index.levels[1]] However, the code above results in an error: TypeError: 'FrozenList' does not support mutable operations. I know I can reset_index and modify the column and then re-create the Multiindex, but I wonder whether there is a more elegant way to modify one particular level of the Multiindex directly. Thanks to @cxrodgers's

Giving a column multiple indexes/headers

夙愿已清 提交于 2019-11-28 07:04:40
I am working with pandas dataframes that are essentially time series like this: level Date 1976-01-01 409.67 1976-02-01 409.58 1976-03-01 409.66 … What I want to have, is multiple indexes/headers for the level column, like so: Station1 #Name of the datasource 43.1977317,-4.6473648,5 #Lat/Lon of the source Precip #Type of data Date 1976-01-01 409.67 1976-02-01 409.58 1976-03-01 409.66 … So essentially I am searching for something like Mydata.columns.level1 = ['Station1'] , Mydata.columns.level2 = [Lat,Lon] , Mydata.columns.level3 = ['Precip'] . Reason being that a single location can have

reading excel sheet as multiindex dataframe through pd.read_excel()

笑着哭i 提交于 2019-11-28 06:55:47
I'm struggle to read a excel sheet with pd.read_excel() . My excel table looks like this in it's raw form: I expected the dataframe to look like this: bar baz foo one two one two one two A B C D E F baz one 0.085930 -0.848468 0.911572 -0.705026 -1.284458 -0.602760 two 0.385054 2.539314 0.589164 0.765126 0.210199 -0.481789 three -0.352475 -0.975200 -0.403591 0.975707 0.533924 -0.195430 is this even possible? My failed attempt: xls_file = pd.read_excel(data_file, header=[0,1,2], index_col=None) Link to the raw excel file: https://www.dropbox.com/s/ek646ab4yb1fvdq/ipsos_excel_tables_type_2_trimed

Pandas : Proper way to set values based on condition for subset of multiindex dataframe

给你一囗甜甜゛ 提交于 2019-11-28 05:58:34
I'm not sure of how to do this without chained assignments (which probably wouldn't work anyways because I'd be setting a copy). I wan't to take a subset of a multiindex pandas dataframe, test for values less than zero and set them to zero. For example: df = pd.DataFrame({('A','a'): [-1,-1,0,10,12], ('A','b'): [0,1,2,3,-1], ('B','a'): [-20,-10,0,10,20], ('B','b'): [-200,-100,0,100,200]}) df[df['A']<0] = 0.0 gives In [37]: df Out[37]: A B a b a b 0 -1 0 -20 -200 1 -1 1 -10 -100 2 0 2 0 0 3 10 3 10 100 4 12 -1 20 200 Which shows that it was not able to set based on the condition. Alternatively

Resampling a pandas dataframe with multi-index containing timeseries

人盡茶涼 提交于 2019-11-28 04:44:13
问题 apologies from creating what appears to be a duplicate of this question. I have a dataframe that is shaped more or less like the one below: df_lenght = 240 df = pd.DataFrame(np.random.randn(df_lenght,2), columns=['a','b'] ) df['datetime'] = pd.date_range('23/06/2017', periods=df_lenght, freq='H') unique_jobs = ['job1','job2','job3',] job_id = [unique_jobs for i in range (1, int((df_lenght/len(unique_jobs))+1) ,1) ] df['job_id'] = sorted( [val for sublist in job_id for val in sublist] ) df.set