multi-index

time slice on second level of multiindex

╄→гoц情女王★ 提交于 2019-12-04 08:37:35
pandas allows for cool slicing on time indexes. For example, I can slice a dataframe df for the months from Janurary 2012 to March 2012 by doing: df['2012-01':'2012-03'] However, I have a dataframe df with a multiindex where the time index is the second level. It looks like: A B C D E a 2001-01-31 0.864841 0.789273 0.370031 0.448256 0.178515 2001-02-28 0.991861 0.079215 0.900788 0.666178 0.693887 2001-03-31 0.016674 0.855109 0.984115 0.436574 0.480339 2001-04-30 0.120924 0.046013 0.659807 0.210534 0.694029 2001-05-31 0.788149 0.296244 0.478201 0.845042 0.437814 b 2001-01-31 0.497646 0.349958 0

How to iterate over MultiIndex levels in Pandas?

被刻印的时光 ゝ 提交于 2019-12-04 08:24:49
I often have MultiIndex indices and I'd like to iterate over groups where higher level indices are equal. It basically looks like from random import choice import pandas as pd N = 100 df = pd.DataFrame([choice([1, 2, 3]) for _ in range(N)], columns=["A"], index=pd.MultiIndex.from_tuples([(choice("ab"), choice("cd"), choice("de")) for _ in range(N)])) for idx in zip(df.index.get_level_values(0), df.index.get_level_values(1)): df_select = df.ix[idx] Is there a way to do the for loop iteration more neatly? Use groupby . The index of the df_select view includes the first two level values, but

Assigning values to Pandas Multiindex DataFrame by index level

让人想犯罪 __ 提交于 2019-12-04 06:49:28
I have a Pandas multiindex dataframe and I need to assign values to one of the columns from a series. The series shares its index with the first level of the index of the dataframe. import pandas as pd import numpy as np idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo']) idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two']) df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B']) s = pd.Series([True, False, True],index = np.unique(idx0)) print df print s out: A B bar one NaN NaN two NaN NaN three NaN NaN baz one NaN NaN foo one NaN NaN two NaN NaN bar True baz False foo

pandas multi index sort specific fields

淺唱寂寞╮ 提交于 2019-12-04 05:07:44
问题 I obtained a multi index in pandas by running series.describe() for a grouped dataframe. How can I sort these series by modelName.mean and only keep sepcific fields? This summary.sortlevel(1)['kappa'] sorts them but retains all the other fields like count. How can I only keep mean and std ? edit this is a textual representation of the df. kappa modelName biasTotal count 5.000000 mean 0.526183 std 0.013429 min 0.507536 25% 0.519706 50% 0.525565 75% 0.538931 max 0.539175

Drop duplicate in multiindex dataframe in pandas

江枫思渺然 提交于 2019-12-04 05:03:02
问题 I am looking to an efficient method to drop duplicate columns in a multiindex dataframe with Pandas. My data : TypePoint TIME Test ... T1 T1 - S Unit1 ... unit unit (POINT, -) ... 24001 90.00 100.000 ... 303.15 303.15 24002 390.00 101.000 ... 303.15 303.15 ... ... ... ... ... 24801 10000 102.000 ... 303.15 303.15 24802 10500 103.000 ... 303.15 303.15 The header contain two information. The variable's name and its unit. I would like to drop the variable "T1" (duplicate variable). .drop

Set value multiindex Pandas

混江龙づ霸主 提交于 2019-12-04 03:51:43
I'm a newbie to both Python and Pandas. I am trying to construct a dataframe, and then later populate it with values. I have constructed my dataframe from pandas import * ageMin = 21 ageMax = 31 ageStep = 2 bins_sumins = [0, 10000, 20000] bins_age = list(range(ageMin, ageMax, ageStep)) indeks_sex = ['M', 'F'] indeks_age = ['[{0}-{1})'.format(bins_age[i-1], bins_age[i]) for i in range(1, len(bins_age))] indeks_sumins = ['[{0}-{1})'.format(bins_sumins[i-1], bins_sumins[i]) for i in range(1, len(bins_sumins))] indeks = MultiIndex.from_product([indeks_age, indeks_sex, indeks_sumins], names=['Age',

What are levels in a pandas DataFrame?

痞子三分冷 提交于 2019-12-04 02:53:06
I've been reading through the documentation and many explanations and examples use levels as something taken for granted. Imho the docs lack a bit on a fundamental explanation of the data structure and definitions. What are levels in a data frame? What are levels in a MultiIndex index? Andrzej Gis I stumbled across this question while analyzing the answer to my own question , but I didn't find the John's answer satisfying enough. After a few experiments though I think I understood the levels and decided to share: Short answer: Levels are parts of the index or column. Long answer: I think this

Merge two MultiIndex levels into one in Pandas

荒凉一梦 提交于 2019-12-04 02:26:09
I have a Pandas data frame which is MultiIndexed. The second level contains a year ([2014,2015]) and the third contains the month number ([1, 2, .., 12]). I would like to merge these two into a single level like - [1/2014, 2/2014 ..., 6/2015]. How could this be done? I'm new to Pandas. Searched a lot but could not find any similar question/solution. Edit: I found a way to avoid having to do this altogether with the answer to this question . I should have been creating my data frame that way. This seems to be the way to go for indexing by DateTime. Consider the pd.MultiIndex and pd.DataFrame ,

Python (pandas): store a data frame in hdf5 with a multi index

大城市里の小女人 提交于 2019-12-03 15:35:56
I need to work with large dimension data frame with multi index, so i tried to create a data frame to learn how to store it in an hdf5 file. The data frame is like this: (with the multi index in the first 2 columns) Symbol Date 0 C 2014-07-21 4792 B 2014-07-21 4492 A 2014-07-21 5681 B 2014-07-21 8310 A 2014-07-21 1197 C 2014-07-21 4722 2014-07-21 7695 2014-07-21 1774 I'm using the pandas.to_hdf but it creates a "Fixed format store", when I try to select the datas in a group: store.select('table','Symbol == "A"') it returns some errors and the main problem is this TypeError: cannot pass a where

Pandas Multiindex Groupby on Columns

十年热恋 提交于 2019-12-03 15:27:22
Is there anyway to use groupby on the columns in a Multiindex. I know you can on the rows and there is good documentation in that regard. However I cannot seem to groupby on columns. The only solution I have is transposing the dataframe. #generate data (copied from pandas example) arrays=[['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index) Now I will try to