pandas | 易学教程

Timeserie datetick problems when using pandas.DataFrame.plot method

阅读更多关于 Timeserie datetick problems when using pandas.DataFrame.plot method

问题 I just discovered something really strange when using plot method of pandas.DataFrame . I am using pandas 0.19.1 . Here is my MWE: import numpy as np import matplotlib.pyplot as plt import matplotlib.dates as mdates import pandas as pd t = pd.date_range('1990-01-01', '1990-01-08', freq='1H') x = pd.DataFrame(np.random.rand(len(t)), index=t) fig, axe = plt.subplots() x.plot(ax=axe) plt.show(axe) xt = axe.get_xticks() When I try to format my xticklabels I get strange beahviours, then I

How to Replace All the “nan” Strings with Empty String in My DataFrame?

阅读更多关于 How to Replace All the “nan” Strings with Empty String in My DataFrame?

问题 I have "None" and "nan" strings scattered in my dataframe. Is there a way to replace all of those with empty string "" or nan so they do not show up when I export the dataframe as excel sheet? Simplified Example: Note: nan in col4 are not strings ID col1 col2 col3 col4 1 Apple nan nan nan 2 None orange None nan 3 None nan banana nan The output should be like this after removing all the "None" and "nan" strings when we replaced them by empty strings "" : ID col1 col2 col3 col4 1 Apple nan 2

How to Replace All the “nan” Strings with Empty String in My DataFrame?

阅读更多关于 How to Replace All the “nan” Strings with Empty String in My DataFrame?

Annualized Return in Pandas

阅读更多关于 Annualized Return in Pandas

问题 I am seeking to confirm that my representation of the annualized return formula (using monthly returns) is optimal. The annualized return formula I am using (where M is a monthly return and D is the total count of monthly returns) where the count of monthly returns is greater than 12 is as follows: Alternatively, the this would change in the case of the monthly return count being less than 12: Here is my representation of this formula in Pandas: ann_return = observations.apply(lambda y: y

Annualized Return in Pandas

阅读更多关于 Annualized Return in Pandas

easy multidimensional numpy ndarray to pandas dataframe method?

阅读更多关于 easy multidimensional numpy ndarray to pandas dataframe method?

问题 Having a 4-D numpy.ndarray, e.g. myarr = np.random.rand(10,4,3,2) dims={'time':1:10,'sub':1:4,'cond':['A','B','C'],'measure':['meas1','meas2']} But with possible higher dimensions. How can I create a pandas.dataframe with multiindex, just passing the dimensions as indexes, without further manual adjustments (reshaping the ndarray into 2D shape)? I can't wrap my head around the reshaping, not even really in 3 dimensions quite yet, so I'm searching for an 'automatic' method if possible. What

easy multidimensional numpy ndarray to pandas dataframe method?

阅读更多关于 easy multidimensional numpy ndarray to pandas dataframe method?

How to split a pandas dataframe into many columns after groupby

阅读更多关于 How to split a pandas dataframe into many columns after groupby

问题 I want to be able to use groupby in pandas to group the data by a column, but then split it so each group is its own column in a dataframe. e.g.: time data 0 1 2.0 1 2 3.0 2 3 4.0 3 1 2.1 4 2 3.1 5 3 4.1 etc. into data1 data2 ... dataN time 1 2.0 2.1 ... 2 3.0 3.1 ... 3 4.0 4.1 ... I am sure the place to start is df.groupby('time') but then I can't seem to figure out the right way to use concat (or other function) to build the split data frame that I want. There is probably some simple

How to split a pandas dataframe into many columns after groupby

阅读更多关于 How to split a pandas dataframe into many columns after groupby

pandas read_csv parse header as string type but i want integer

阅读更多关于 pandas read_csv parse header as string type but i want integer

问题 for example, csv file is as below ,(1,2,3) is header! 1,2,3 0,0,0 I read csv file using pd.read_csv and print import pandas as pd df = pd.read_csv('./test.csv') print(df[1]) it occur error key error:1 it seems like that read_csv parse header as string.. is there any way using integer type in dataframe column? 回答1: I think more general is cast to columns names to integer by astype: df = pd.read_csv('./test.csv') df.columns = df.columns.astype(int) Another way is first get only first column and