pandas | 易学教程

Accessing a Pandas index like a regular column

阅读更多关于 Accessing a Pandas index like a regular column

问题 I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this: import pandas as pd, numpy as np df=pd.DataFrame({'name

python pandas .apply() function index error

阅读更多关于 python pandas .apply() function index error

问题 I have the following DataFrame: P N ID Year Month TS 2016-06-26 19:30:00 263.600006 5.4 5 2016 6 2016-06-26 20:00:00 404.700012 5.6 5 2016 6 2016-06-26 21:10:00 438.600006 6.0 5 2016 6 2016-06-26 21:20:00 218.600006 5.6 5 2016 6 2016-07-02 16:10:00 285.300049 15.1 5 2016 7 I'm trying to add a new column based on the values of columns Year and Month something like the following def exp_records(row): return calendar.monthrange(row['Year'], row['Month'])[1] df['exp_counts'] = df.apply(exp

Python Histogram ValueError: range parameter must be finite

阅读更多关于 Python Histogram ValueError: range parameter must be finite

问题 when plotting Pandas dataframe using a histogram, sample dataframe data distance 0 5.680195 2 0.000000 3 7.974658 4 2.461387 5 9.703089 code I use to plot import matplotlib.pyplot as plt plt.hist(df['distance'].values) plt.show() I have this error "ValueError: range parameter must be finite." my attempt df['Round_Distance'] = df['distance'].round(1) 0 5.7 2 0.0 3 8.0 4 2.5 5 9.7 plot again, new error plt.hist(df['Round_Distance'].values) plt.show() ValueError: max must be larger than min in

Python Histogram ValueError: range parameter must be finite

阅读更多关于 Python Histogram ValueError: range parameter must be finite

pandas, combine rows based on certain column values and NAN

阅读更多关于 pandas, combine rows based on certain column values and NAN

问题 So I have a pandas dataframe that looks like this: id_1 id_2 value1 value2 1 2 100 NAN 1 2 NAN 101 10 20 200 NAN 10 20 NAN 202 10 2 345 345 And I want a dataframe like this: id_1 id_2 value1 value2 1 2 100 101 10 20 200 202 a b c d Basically, if both ID columns match up, then there will definitely be a value-nan vs nan-value situation, and I want to combine the rows by just replacing the nans . Does pandas have a utility for this? It's not quite stacking, or melting. Maybe pivoting, but I'd

pandas outer product of two dataframes with same index

阅读更多关于 pandas outer product of two dataframes with same index

问题 Consider the following dataframes d1 and d1 d1 = pd.DataFrame([ [1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3], [2, 3, 4], [3, 4, 5] ], columns=list('ABC')) d2 = pd.get_dummies(list('XYZZXY')) d1 A B C 0 1 2 3 1 2 3 4 2 3 4 5 3 1 2 3 4 2 3 4 5 3 4 5 d2 X Y Z 0 1 0 0 1 0 1 0 2 0 0 1 3 0 0 1 4 1 0 0 5 0 1 0 I need to get a new dataframe with a multi-index columns object that has the product of every combination of columns from d1 and d2 So far I've done this... from itertools import product pd

pandas resample to specific weekday in month

阅读更多关于 pandas resample to specific weekday in month

问题 I have a Pandas dataframe where I'd like to resample to every third Friday of the month. np.random.seed(0) #requested output: dates = pd.date_range("2018-01-01", "2018-08-31") dates_df = pd.DataFrame(data=np.random.random(len(dates)), index=dates) mask = (dates.weekday == 4) & (14 < dates.day) & (dates.day < 22) dates_df.loc[mask] But when a third Friday is missing (e.g. dropping Feb third Friday), I want to have the latest value (so as of 2018-02-15). Using the mask gives me the next value

Pandas: How could I iterate two dataframes which have exactly same format?

阅读更多关于 Pandas: How could I iterate two dataframes which have exactly same format?

问题 My final goal is making list which contain a pair for corresponding location of dataframes, like below [df_one_first_element, df_two_first_element, column_first, index_first] :[0.619159, 0.510162, 20140109,0.50], [0.264191,0.269053,20140213,0.50]... So I am trying to iterate two dataframe but got stuck now. How could I iterate two dataframe which has exactly same format but different data. For example, I have two dataframes; df_one and df_two that appear like the below: df_one = 20140109

np.argsort which excludes zero values

阅读更多关于 np.argsort which excludes zero values

问题 I have an array [0.2,0,0,0,0.3,0,0,0,0.4] . I'm using np.argsort to sort values and get that indexes. So, for my example, it will be something like [1,5,9,2,3,4,6...] . However, I would like to get array of indexes only for non zero values . In my example only [1,5,9] . How do I implement it in python with pandas and numpy ? 回答1: Using np.nonzero and indexing trick def sparse_argsort(arr): indices = np.nonzero(arr)[0] return indices[np.argsort(arr[indices])] sparse_argsort(a) array([0, 4, 8])

Renaming the column names of pandas dataframe is not working as expected - python

阅读更多关于 Renaming the column names of pandas dataframe is not working as expected - python

问题 I am having below pandas dataframe df . I am trying to rename the column names but it not working as expected. Code: mapping = {df.columns[0]:'Date', df.columns[1]: 'A', df.columns[2]:'B', df.columns[3]: 'C',df.columns[4]:'D', df.columns[5]: 'E',df.columns[6]:'F', df.columns[7]: 'G',df.columns[8]:'H', df.columns[9]: 'J'} df.rename(columns=mapping) Output of df.columns : MultiIndex(levels=[['A Index', 'B Index', 'C Index', 'D Index', 'E Index', 'F Index', 'G Index', 'H Index', 'I Index', 'J