multi-index

Pandas - write Multiindex rows with to_csv

一曲冷凌霜 提交于 2019-11-29 07:10:23
I am using to_csv to write a Multiindex DataFrame to csv files. The csv file has one column that contains the multiindexes in tuples, like: ('a', 'x') ('a', 'y') ('a', 'z') ('b', 'x') ('b', 'y') ('b', 'z') However, I want to be able to output the Multiindex to two columns instead of one column of tuples, such as: a, x , y , z b, x , y , z It looks like tupleize_cols can achieve this for columns, but there is no such option for the rows. Is there a way to achieve this? I think this will do it In [3]: df = DataFrame(dict(A = 'foo', B = 'bar', value = 1),index=range(5)).set_index(['A','B']) In [4

how boost multi_index is implemented

五迷三道 提交于 2019-11-29 06:02:13
问题 I have some difficulties understanding how Boost.MultiIndex is implemented. Lets say I have the following: typedef multi_index_container< employee, indexed_by< ordered_unique<member<employee, std::string, &employee::name> >, ordered_unique<member<employee, int, &employee::age> > > > employee_set; I imagine that I have one array, Employee[] , which actually stores the employee objects, and two maps map<std::string, employee*> map<int, employee*> with name and age as keys. Each map has employee

collapse a pandas MultiIndex

半腔热情 提交于 2019-11-29 05:21:31
Suppose I have a DataFrame with MultiIndex columns. How can I collapse the levels to a concatenation of the values so that I only have one level? Setup np.random.seed([3, 14]) col = pd.MultiIndex.from_product([list('ABC'), list('DE'), list('FG')]) df = pd.DataFrame(np.random.rand(4, 12) * 10, columns=col).astype(int) print df A B C D E D E D E F G F G F G F G F G F G 0 2 1 1 7 5 9 9 2 7 4 0 3 1 3 7 1 1 5 3 1 4 3 5 6 0 2 2 6 9 9 9 5 7 0 1 2 7 5 3 2 2 8 0 3 9 4 7 0 8 2 5 I want the result to look like this: ADF ADG AEF AEG BDF BDG BEF BEG CDF CDG CEF CEG 0 2 1 1 7 5 9 9 2 7 4 0 3 1 3 7 1 1 5 3 1

Using .loc with a MultiIndex in pandas?

笑着哭i 提交于 2019-11-29 04:56:13
问题 Does anyone know if it is possible to use the DataFrame.loc method to select from a MultiIndex? I have the following DataFrame and would like to be able to access the values located in the 'Dwell' columns, at the indices of ('at', 1) , ('at', 3) , ('at', 5) , and so on (non-sequential). I'd love to be able to do something like data.loc[['at',[1,3,5]], 'Dwell'] , similar to the data.loc[[1,3,5], 'Dwell'] syntax for a regular index (which returns a 3-member series of Dwell values). My purpose

Read multi-index on the columns from csv file

戏子无情 提交于 2019-11-29 00:52:22
问题 I have a .csv file that looks like this: Male, Male, Male, Female, Female R, R, L, R, R .86, .67, .88, .78, .81 I want to read that into a df, so that I have: Male Female R L R 0 .86 .67 .88 .78 .81 I did: df = pd.read_csv('file.csv', header=[0,1]) But headers does not cut it. Which results in Empty DataFrame Columns: [(Male, R), (Male, R), (Male, L), (Female, R), (Female, R)] Index: [] Yet, the docs on headers says: (...)Can be a list of integers that specify row locations for a multi-index

Summing over a multiindex level in a pandas series

ぐ巨炮叔叔 提交于 2019-11-28 20:15:43
Using the Pandas package in python, I would like to sum (marginalize) over one level in a series with a 3-level multiindex to produce a series with a 2 level multiindex. For example, if I have the following: ind = [tuple(x) for x in ['ABC', 'ABc', 'AbC', 'Abc', 'aBC', 'aBc', 'abC', 'abc']] mi = pd.MultiIndex.from_tuples(ind) data = pd.Series([264, 13, 29, 8, 152, 7, 15, 1], index=mi) A B C 264 c 13 b C 29 c 8 a B C 152 c 7 b C 15 c 1 I would like to sum over the variable C to produce the following output: A B 277 b 37 a B 159 b 16 What is the best way in Pandas to do this? If you know you

Reshaping dataframes in pandas based on column labels

僤鯓⒐⒋嵵緔 提交于 2019-11-28 19:42:20
What is the best way to reshape the following dataframe in pandas? This DataFrame df has x,y values for each sample ( s1 and s2 in this case) and looks like this: In [23]: df = pandas.DataFrame({"s1_x": scipy.randn(10), "s1_y": scipy.randn(10), "s2_x": scipy.randn(10), "s2_y": scipy.randn(10)}) In [24]: df Out[24]: s1_x s1_y s2_x s2_y 0 0.913462 0.525590 -0.377640 0.700720 1 0.723288 -0.691715 0.127153 0.180836 2 0.181631 -1.090529 -1.392552 1.530669 3 0.997414 -1.486094 1.207012 0.376120 4 -0.319841 0.195289 -1.034683 0.286073 5 1.085154 -0.619635 0.396867 0.623482 6 1.867816 -0.928101 -0

pandas: how to run a pivot with a multi-index?

社会主义新天地 提交于 2019-11-28 16:50:50
I would like to run a pivot on a pandas DataFrame , with the index being two columns, not one. For example, one field for the year, one for the month, an 'item' field which shows 'item 1' and 'item 2' and a 'value' field with numerical values. I want the index to be year + month. The only way I managed to get this to work was to combine the two fields into one, then separate them again. is there a better way? Minimal code copied below. Thanks a lot! PS Yes, I am aware there are other questions with the keywords 'pivot' and 'multi-index', but I did not understand if/how they can help me with

Pandas Plotting with Multi-Index

穿精又带淫゛_ 提交于 2019-11-28 16:29:35
After performing a groupby.sum() on a DataFrame I'm having some trouble trying to create my intended plot. How can I create a subplot ( kind='bar' ) for each Code , where the x-axis is the Month and the bars are ColA and ColB ? Reustonium I found the unstack(level) method to work perfectly, which has the added benefit of not needing a priori knowledge about how many Codes there are. df.unstack(level=0).plot(kind='bar', subplots=True) Using the following DataFrame ... # using pandas version 0.14.1 from pandas import DataFrame import pandas as pd import matplotlib.pyplot as plt data = {'ColB': {

Sum columns by level in a Multi-Index DataFrame

与世无争的帅哥 提交于 2019-11-28 14:09:04
I have my df with multi-index columns. All of my values are in float, and I want to merge values with in first level of multi-index. Please see below for detail. first bar baz foo second one two one two one A 0.895717 0.805244 1.206412 2.565646 1.431256 B 0.410835 0.813850 0.132003 0.827317 0.076467 C 1.413681 1.607920 1.024180 0.569605 0.875906 first bar baz foo A (0.895717+0.805244) (1.206412+2.565646) 1.431256 B (0.410835+0.813850) (0.132003+0.827317) 0.076467 C (1.413681+1.607920) (1.024180+0.569605) 0.875906 The values are actually added (I just didn't feel like doing all this :)). Bottom