pandas

Accessing a Pandas index like a regular column

半城伤御伤魂 提交于 2021-02-18 09:54:28
问题 I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this: import pandas as pd, numpy as np df=pd.DataFrame({'name

python pandas .apply() function index error

自闭症网瘾萝莉.ら 提交于 2021-02-18 08:16:12
问题 I have the following DataFrame: P N ID Year Month TS 2016-06-26 19:30:00 263.600006 5.4 5 2016 6 2016-06-26 20:00:00 404.700012 5.6 5 2016 6 2016-06-26 21:10:00 438.600006 6.0 5 2016 6 2016-06-26 21:20:00 218.600006 5.6 5 2016 6 2016-07-02 16:10:00 285.300049 15.1 5 2016 7 I'm trying to add a new column based on the values of columns Year and Month something like the following def exp_records(row): return calendar.monthrange(row['Year'], row['Month'])[1] df['exp_counts'] = df.apply(exp

Python Histogram ValueError: range parameter must be finite

扶醉桌前 提交于 2021-02-18 08:05:16
问题 when plotting Pandas dataframe using a histogram, sample dataframe data distance 0 5.680195 2 0.000000 3 7.974658 4 2.461387 5 9.703089 code I use to plot import matplotlib.pyplot as plt plt.hist(df['distance'].values) plt.show() I have this error "ValueError: range parameter must be finite." my attempt df['Round_Distance'] = df['distance'].round(1) 0 5.7 2 0.0 3 8.0 4 2.5 5 9.7 plot again, new error plt.hist(df['Round_Distance'].values) plt.show() ValueError: max must be larger than min in

Python Histogram ValueError: range parameter must be finite

[亡魂溺海] 提交于 2021-02-18 08:04:32
问题 when plotting Pandas dataframe using a histogram, sample dataframe data distance 0 5.680195 2 0.000000 3 7.974658 4 2.461387 5 9.703089 code I use to plot import matplotlib.pyplot as plt plt.hist(df['distance'].values) plt.show() I have this error "ValueError: range parameter must be finite." my attempt df['Round_Distance'] = df['distance'].round(1) 0 5.7 2 0.0 3 8.0 4 2.5 5 9.7 plot again, new error plt.hist(df['Round_Distance'].values) plt.show() ValueError: max must be larger than min in

pandas, combine rows based on certain column values and NAN

回眸只為那壹抹淺笑 提交于 2021-02-18 07:41:31
问题 So I have a pandas dataframe that looks like this: id_1 id_2 value1 value2 1 2 100 NAN 1 2 NAN 101 10 20 200 NAN 10 20 NAN 202 10 2 345 345 And I want a dataframe like this: id_1 id_2 value1 value2 1 2 100 101 10 20 200 202 a b c d Basically, if both ID columns match up, then there will definitely be a value-nan vs nan-value situation, and I want to combine the rows by just replacing the nans . Does pandas have a utility for this? It's not quite stacking, or melting. Maybe pivoting, but I'd

pandas outer product of two dataframes with same index

萝らか妹 提交于 2021-02-18 07:40:47
问题 Consider the following dataframes d1 and d1 d1 = pd.DataFrame([ [1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3], [2, 3, 4], [3, 4, 5] ], columns=list('ABC')) d2 = pd.get_dummies(list('XYZZXY')) d1 A B C 0 1 2 3 1 2 3 4 2 3 4 5 3 1 2 3 4 2 3 4 5 3 4 5 d2 X Y Z 0 1 0 0 1 0 1 0 2 0 0 1 3 0 0 1 4 1 0 0 5 0 1 0 I need to get a new dataframe with a multi-index columns object that has the product of every combination of columns from d1 and d2 So far I've done this... from itertools import product pd

pandas resample to specific weekday in month

六月ゝ 毕业季﹏ 提交于 2021-02-18 07:39:06
问题 I have a Pandas dataframe where I'd like to resample to every third Friday of the month. np.random.seed(0) #requested output: dates = pd.date_range("2018-01-01", "2018-08-31") dates_df = pd.DataFrame(data=np.random.random(len(dates)), index=dates) mask = (dates.weekday == 4) & (14 < dates.day) & (dates.day < 22) dates_df.loc[mask] But when a third Friday is missing (e.g. dropping Feb third Friday), I want to have the latest value (so as of 2018-02-15). Using the mask gives me the next value

Pandas: How could I iterate two dataframes which have exactly same format?

☆樱花仙子☆ 提交于 2021-02-18 06:53:29
问题 My final goal is making list which contain a pair for corresponding location of dataframes, like below [df_one_first_element, df_two_first_element, column_first, index_first] :[0.619159, 0.510162, 20140109,0.50], [0.264191,0.269053,20140213,0.50]... So I am trying to iterate two dataframe but got stuck now. How could I iterate two dataframe which has exactly same format but different data. For example, I have two dataframes; df_one and df_two that appear like the below: df_one = 20140109

np.argsort which excludes zero values

徘徊边缘 提交于 2021-02-18 06:47:21
问题 I have an array [0.2,0,0,0,0.3,0,0,0,0.4] . I'm using np.argsort to sort values and get that indexes. So, for my example, it will be something like [1,5,9,2,3,4,6...] . However, I would like to get array of indexes only for non zero values . In my example only [1,5,9] . How do I implement it in python with pandas and numpy ? 回答1: Using np.nonzero and indexing trick def sparse_argsort(arr): indices = np.nonzero(arr)[0] return indices[np.argsort(arr[indices])] sparse_argsort(a) array([0, 4, 8])

Renaming the column names of pandas dataframe is not working as expected - python

 ̄綄美尐妖づ 提交于 2021-02-18 06:43:28
问题 I am having below pandas dataframe df . I am trying to rename the column names but it not working as expected. Code: mapping = {df.columns[0]:'Date', df.columns[1]: 'A', df.columns[2]:'B', df.columns[3]: 'C',df.columns[4]:'D', df.columns[5]: 'E',df.columns[6]:'F', df.columns[7]: 'G',df.columns[8]:'H', df.columns[9]: 'J'} df.rename(columns=mapping) Output of df.columns : MultiIndex(levels=[['A Index', 'B Index', 'C Index', 'D Index', 'E Index', 'F Index', 'G Index', 'H Index', 'I Index', 'J