pandas | 易学教程

Pandas Time Series DataFrame Missing Values

阅读更多关于 Pandas Time Series DataFrame Missing Values

问题 I have a dataset of Total Sales from 2008-2015. I have an entry for each day, and so I have a created a pandas DataFrame with a DatetimeIndex and a column for sales. So it looks like this The problem is that I am missing data for most of 2010. These missing values are currently represented by 0.0 so if I plot the DataFrame I get I want to try forecast values for 2016, possibly using an ARIMA model, so the first step I took was to perform a decomposition of this time series Obviously if I

check all items in csv column except one [python pandas]

阅读更多关于 check all items in csv column except one [python pandas]

问题 I'm trying to figure out how to check an entire column to verify all values are integers, except one, using python pandas. One row name will always have a float num. CSV example: name, num random1,2 random2,3 random3,2.89 random4,1 random5,3.45 In this example, let's say 'random3's num will always be a float. So that fact that random5 is also a float, means the program should print an error to the terminal telling the user this. 回答1: Try this: if len(df.num.apply(type) == float) >= 2: print(f

Remove item in pandas dataframe that starts with a comment char [duplicate]

阅读更多关于 Remove item in pandas dataframe that starts with a comment char [duplicate]

问题 This question already has answers here : pandas select from Dataframe using startswith (4 answers) Closed 2 years ago . I would like to remove all rows in a pandas dataframe that starts with a comment character. For example: >>> COMMENT_CHAR = '#' >>> df first_name last_name 0 #fill in here fill in here 1 tom jones >>> df.remove(df.columns[0], startswith=COMMENT_CHAR) # in pseudocode >>> df first_name last_name 0 tom jones How would this actually be done? 回答1: Setup >>> data = [['#fill in

Assigning a list value to pandas dataframe [duplicate]

阅读更多关于 Assigning a list value to pandas dataframe [duplicate]

问题 This question already has an answer here : Set Pandas column values to an array (1 answer) Closed 1 year ago . I cannot seem to reassign a value in a pandas dataframe with a list. Python wants to iterate over the list and I didn't think it would do that. For example, if I have the following: import pandas as pd val1 = [0, 1, 2] val2 = [3, 4, 5] d_list = [] for v1, v2 in zip(val1, val2): d_list.append({'val1':v1, 'val2':v2}) df = pd.DataFrame(d_list) val3 = [6, 7, 8, 9] df['val3'] = [val3]*len

python how do I perform the below operation in dataframe

阅读更多关于 python how do I perform the below operation in dataframe

问题 df1 = pd.DataFrame({ 'Year': ["1A", "2A", "3A", "4A", "5A"], 'Tval1' : [1, 9, 8, 1, 6], 'Tval2' : [34, 56, 67, 78, 89] }) it looks more like this and I want to change it to make it look like this, the 2nd column is moved under the individual row. 回答1: Idea is get numbers from Year column, then set new columns names after Year column and reshape by DataFrame.stack: df1['Year'] = df1['Year'].str.extract('(\d+)') df = df1.set_index('Year') #add letters by length of columns, working for 1 to 26

how to delete a duplicate column read from excel in pandas

阅读更多关于 how to delete a duplicate column read from excel in pandas

问题 Data in excel: a b a d 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 Code: df= pd.io.excel.read_excel(r"sample.xlsx",sheetname="Sheet1") df a b a.1 d 0 1 2 3 4 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 how to delete the column a.1 ? when pandas reads the data from excel it automatically changes the column name of 2nd a to a.1. I tried df.drop("a.1",index=1) , this does not work. I have a huge excel file which has duplicate names, and i am interested only in few of columns. 回答1: If you know the name of the column you

Does Python cache repeatedly accessed files?

阅读更多关于 Does Python cache repeatedly accessed files?

问题 I was wondering if Python is smart enough enough to cache repeatedly accessed files, e.g. when reading the same CSV with pandas or unpickling the same file multiple times. Is this even Python's responsibility, or should the operating system take care of it? 回答1: No, Python is just a language and doesn't really do anything on its own. A particular Python library might implement caching, but the standard functions you use to open and read files don't do so. The higher-level file-loading

Most efficient way of saving a pandas dataframe or 2d numpy array into h5py, with each row a seperate key, using a column

阅读更多关于 Most efficient way of saving a pandas dataframe or 2d numpy array into h5py, with each row a seperate key, using a column

问题 This is a follow up to this stackoverflow question Column missing when trying to open hdf created by pandas in h5py Where I am trying to create save a large amount of data onto a disk (too large to fit into memory), and retrieve sepecific rows of the data using indices. One of the solutions given in the linked post is to create a seperate key for every every row. At the moment I can only think of iterating through each row, and setting the keys directly. For example, if this is my data

Replace string in dataframe if a condition in a different row is met

阅读更多关于 Replace string in dataframe if a condition in a different row is met

问题 I have a dataframe made up of a date and a value column, kind of like this: >>> df date value 0 2016-09-10 value1 1 2016-09-10 value1 2 2016-09-10 value2 3 2016-09-10 value1 4 2016-09-12 value3 5 2016-09-12 value1 6 2016-09-13 value2 7 2016-09-13 value1 I would like to replace all of the value1 in df['value'] that fall on the date '2016-09-10' with value7 . The Date column is a string series. I looked at the documentation for pd.DataFrame.replace() , but couldn't find an argument for a

Keep maximum value per group including repetitions

阅读更多关于 Keep maximum value per group including repetitions

问题 Let's say I have a dataframe like this: a b c 0 x1 y1 9 1 x1 y2 9 2 x1 y3 4 3 x2 y4 2 4 x2 y5 10 5 x2 y6 5 6 x3 y7 6 7 x3 y8 4 8 x3 y9 8 9 x4 y10 11 10 x4 y11 11 11 x4 y12 11 I first want to do a grouped sort of column c (grouped by column a ), and then I want to retain all the rows in each group that have the highest values of column c . So the output will look like: a b c 0 x1 y1 9 1 x1 y2 9 4 x2 y5 10 8 x3 y9 8 9 x4 y10 11 10 x4 y11 11 11 x4 y12 11 Is there a clean way of doing so without