multi-index

Setting values with multiindex in pandas

江枫思渺然 提交于 2019-12-05 10:35:20
There are already a couple of questions on SO relating to this, most notably this one , however none of the answers seem to work for me and quite a few links to docs (especially on lexsorting) are broken, so I'll ask another one. I'm trying do to something (seemingly) very simple. Consider the following MultiIndexed Dataframe: import pandas as pd; import random arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) df = pd

Reindex sublevel of pandas dataframe multiindex

和自甴很熟 提交于 2019-12-05 07:03:28
I have a time series dataframe and I would like to reindex it by Trials and Measurements. Simplified, I have this: value Trial 1 0 13 1 3 2 4 2 3 NaN 4 12 3 5 34 Which I want to turn into this: value Trial 1 0 13 1 3 2 4 2 0 NaN 1 12 3 0 34 How can I best do this? Dan Allan Just yesterday, the illustrious Andy Hayden added this feature to version 0.13 of pandas, which will be released any day now. See here for usage example he added to the docs. If you are comfortable installing the development version of pandas from source, you can use it now. df['Measurements'] = df.reset_index().groupby(

Merge MultiIndex columns together into 1 level [duplicate]

﹥>﹥吖頭↗ 提交于 2019-12-05 04:04:23
This question already has an answer here: Pandas - How to flatten a hierarchical index in columns 16 answers Here's some data from another question: date type value 1/1/2016 a 1 1/1/2016 b 2 1/1/2016 a 1 1/1/2016 b 4 1/2/2016 a 1 1/2/2016 b 1 Run this line of code: x = df.groupby(['date', 'type']).value.agg(['sum', 'max']).unstack() x should look like this: sum max type a b a b date 1/1/2016 2 6 1 4 1/2/2016 1 1 1 1 I want to combine the columns on the upper and lower level to get this: sum_a sum_b max_a max_b date 1/1/2016 2 6 1 4 1/2/2016 1 1 1 1 Is there an easy way to do this? greg_data

Using boolean indexing for row and column MultiIndex in Pandas

本小妞迷上赌 提交于 2019-12-05 02:51:08
Questions are at the end, in bold . But first, let's set up some data: import numpy as np import pandas as pd from itertools import product np.random.seed(1) team_names = ['Yankees', 'Mets', 'Dodgers'] jersey_numbers = [35, 71, 84] game_numbers = [1, 2] observer_names = ['Bill', 'John', 'Ralph'] observation_types = ['Speed', 'Strength'] row_indices = list(product(team_names, jersey_numbers, game_numbers, observer_names, observation_types)) observation_values = np.random.randn(len(row_indices)) tns, jns, gns, ons, ots = zip(*row_indices) data = pd.DataFrame({'team': tns, 'jersey': jns, 'game':

Pandas Multiindex Groupby on Columns

大憨熊 提交于 2019-12-05 00:32:00
问题 Is there anyway to use groupby on the columns in a Multiindex. I know you can on the rows and there is good documentation in that regard. However I cannot seem to groupby on columns. The only solution I have is transposing the dataframe. #generate data (copied from pandas example) arrays=[['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['first',

In Pandas How to sort one level of a multi-index based on the values of a column, while maintaining the grouping of the other level

这一生的挚爱 提交于 2019-12-04 20:29:18
问题 I'm taking a Data Mining course at university right now, but I'm a wee bit stuck on a multi-index sorting problem. The actual data involves about 1 million reviews of movies, and I'm trying to analyze that based on American zip codes, but to test out how to do what I want, I've been using a much smaller data set of 250 randomly generated ratings for 10 movies and instead of zip codes, I'm using age groups. So this is what I have right now, it's a multiindexed DataFrame in Pandas with two

How to properly pivot or reshape a timeseries dataframe in Pandas?

折月煮酒 提交于 2019-12-04 19:02:50
I need to reshape a dataframe that looks like df1 and turn it into df2. There are 2 considerations for this procedure: I need to be able to set the number of rows to be sliced as a parameter (length). I need to split date and time from the index, and use date in the reshape as the column names and keep time as the index. Current df1 2007-08-07 18:00:00 1 2007-08-08 00:00:00 2 2007-08-08 06:00:00 3 2007-08-08 12:00:00 4 2007-08-08 18:00:00 5 2007-11-02 18:00:00 6 2007-11-03 00:00:00 7 2007-11-03 06:00:00 8 2007-11-03 12:00:00 9 2007-11-03 18:00:00 10 Desired Output df2 - With the parameter

Is there an equivalent of boost::multi_index for Java someplace?

落爺英雄遲暮 提交于 2019-12-04 19:02:49
问题 I stumbled upon multi_index on a lark last night while pounding my head against a collection that I need to access by 3 different key values, and also to have rebalancing array semantics. Well, I got one of my two wishes (3 different key values) in boost::multi_index . Does anything similar exist in the Java world? 回答1: I have just finished MultiIndexContainer in Java: http://code.google.com/p/multiindexcontainer/wiki/MainPage. I know that it is not complete equivalent of boost multi_index

Convert MultiIndex DataFrame to Series

心不动则不痛 提交于 2019-12-04 15:31:05
I created a multiIndex DataFrame by: df.set_index(['Field1', 'Field2'], inplace=True) If this is not a multiIndex DataFrame please tell me how to make one. I want to: Group by the same columns that are in the index Aggregate a count of each group Then return the whole thing as a Series with Field1 and Field2 as the index How do I go about doing this? ADDITIONAL INFO I have a multiIndex dataFrame that looks like this: Continent Sector Count Asia 1 4 2 1 Australia 1 1 Europe 1 1 2 3 3 2 North America 1 1 5 1 South America 5 1 How can I return this as a Series with the index of [Continent, Sector

Multi-Indexed fillna in Pandas

夙愿已清 提交于 2019-12-04 15:15:59
I have a multi-indexed dataframe and I'm looking to backfill missing values within a group. The dataframe I have currently looks like this: df = pd.DataFrame({ 'group': ['group_a'] * 7 + ['group_b'] * 3 + ['group_c'] * 2, 'Date': ["2013-06-11", "2013-07-02", "2013-07-09", "2013-07-30", "2013-08-06", "2013-09-03", "2013-10-01", "2013-07-09", "2013-08-06", "2013-09-03", "2013-07-09", "2013-09-03"], 'Value': [np.nan, np.nan, np.nan, 9, 4, 40, 18, np.nan, np.nan, 5, np.nan, 2]}) df.Date = df['Date'].apply(lambda x: pd.to_datetime(x).date()) df = df.set_index(['group', 'Date']) I'm trying to get a