pandas

Pandas how to read sub headers

随声附和 提交于 2021-02-18 13:52:56
问题 I'm using python+pandas to process a csv file. The csv file has multiple headers, like Header1 Header2 Date Subheader1-1 Subheader1-2 Subheader2-1 Subheader2-2 And in raw text format, the csv file content looks like ,Header1,,Header2,,... Date,Subheader1-1,Subheader1-2,Subheader2-1,Subheader2-2,... ... My question is, Does Pandas support this sub-header format? If not, is there a way to read this csv into pandas dataframe and do some calculation on it? (The calculation is like extracting

Calculating Rolling forward averages with pandas

旧巷老猫 提交于 2021-02-18 13:51:57
问题 I need to calculate some rolling forward averages in a dataframe and really don't know where to start. I know if I wanted to select a cell 10 days ahead say I would do df.shift(-10) , but what I'm looking to do is calculate the average between 10 and 15 days ahead say. So what I'm kind of thinking is df.rolling(-10,-15).mean() , if I was trying to calculate just a moving average going backing in time df.rolling(15, 10).mean() would work perfectly and I did think about just calculating the

Passing a pandas dataframe column to an NLTK tokenizer

纵然是瞬间 提交于 2021-02-18 12:59:15
问题 I have a pandas dataframe raw_df with 2 columns, ID and sentences. I need to convert each sentence to a string. The code below produces no errors and says datatype of rule is "object." raw_df['sentences'] = raw_df.sentences.astype(str) raw.df.sentences.dtypes Out: dtype('O') Then, I try to tokenize sentences and get a TypeError that the method is expecting a string or bytes-like object. What am I doing wrong? raw_sentences=tokenizer.tokenize(raw_df) Same TypeError for raw_sentences = nltk

How can I send a plot.ly image inline of an html email using smtp?

你。 提交于 2021-02-18 12:51:47
问题 I'm automating a couple of bi-weekly reports so I've decided to use plot.ly to create a line plot. This line plot has a varying amount of traces depending on the report that is being run. I've been able to create plots successfully but none of the methods I've found have worked for displaying the plot inline in my email. Here is my code: SMTP_SERVER = "smtp.office365.com" SMTP_PORT = 587 SMTP_USERNAME = username SMTP_PASSWORD = password EMAIL_TO = email_to EMAIL_FROM = email_from #here we

How can I send a plot.ly image inline of an html email using smtp?

99封情书 提交于 2021-02-18 12:51:31
问题 I'm automating a couple of bi-weekly reports so I've decided to use plot.ly to create a line plot. This line plot has a varying amount of traces depending on the report that is being run. I've been able to create plots successfully but none of the methods I've found have worked for displaying the plot inline in my email. Here is my code: SMTP_SERVER = "smtp.office365.com" SMTP_PORT = 587 SMTP_USERNAME = username SMTP_PASSWORD = password EMAIL_TO = email_to EMAIL_FROM = email_from #here we

Pandas: how to get a particular group after groupby? [duplicate]

与世无争的帅哥 提交于 2021-02-18 12:14:45
问题 This question already has answers here : How to access pandas groupby dataframe by key (5 answers) Closed 6 years ago . I want to group a dataframe by a column, called 'A', and inspect a particular group. grouped = df.groupby('A', sort=False) However, I don't know how to access a group, for example, I expect that grouped.first() would give me the first group Or grouped['foo'] would give me the group where A=='foo' . However, Pandas doesn't work like that. I couldn't find a similar example

Marking specific dates when visualizing a time series

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-18 12:14:30
问题 I have a time series that has a few years' worth of data, for example this: ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000)) ts = ts.cumsum() ts.plot() I also have two extra arrays: let's call the first dates = [pd.datetime("2000-12-01"), pd.datetime("2001-01-03")] And the second labels = ["My birthday", "My dad's birthday"] labels[i] contains the label for dates[i]. What I'd like to do is to display them in the time series graph so that they can be

Filling na values with merge from another dataframe

扶醉桌前 提交于 2021-02-18 12:11:07
问题 I have a column with na values that I want to fill according to values from another data frame according to a key. I was wondering if there is any simple way to do so. Example: I have a data frame of objects and their colors like this: object color 0 chair black 1 ball yellow 2 door brown 3 ball **NaN** 4 chair white 5 chair **NaN** 6 ball grey I want to fill na values in the color column with default color from the following data frame: object default_color 0 chair brown 1 ball blue 2 door

Filling na values with merge from another dataframe

一曲冷凌霜 提交于 2021-02-18 12:10:29
问题 I have a column with na values that I want to fill according to values from another data frame according to a key. I was wondering if there is any simple way to do so. Example: I have a data frame of objects and their colors like this: object color 0 chair black 1 ball yellow 2 door brown 3 ball **NaN** 4 chair white 5 chair **NaN** 6 ball grey I want to fill na values in the color column with default color from the following data frame: object default_color 0 chair brown 1 ball blue 2 door

Filtering multiple conditions from a Dataframe in Python

梦想的初衷 提交于 2021-02-18 12:09:35
问题 I want to filter out data from a dataframe using multiple conditions using multiple columns. I tried doing so like this: arrival_delayed_weather = [[[flight_data_finalcopy["ArrDelay"] > 0]] & [[flight_data_finalcopy["WeatherDelay"]>0]]] arrival_delayed_weather_filter = arrival_delayed_weather[["UniqueCarrier", "AirlineID"]] print arrival_delayed_weather_filter However I get this error message: TypeError: unsupported operand type(s) for &: 'list' and 'list' How do I solve this? Thanks in