pandas

Fill missing timestamps and apply different operations on different columns

折月煮酒 提交于 2021-02-10 05:30:08
问题 I have a data in below format user timestamp flowers total_flowers xyz 01-01-2020 00:05:00 15 15 xyz 01-01-2020 00:10:00 5 20 xyz 01-01-2020 00:15:00 21 41 xyz 01-01-2020 00:35:00 1 42 ... xyz 01-01-2020 11:45:00 57 1029 xyz 01-01-2020 11:55:00 18 1047 Expected Output: user timestamp flowers total_flowers xyz 01-01-2020 00:05:00 15 15 xyz 01-01-2020 00:10:00 5 20 xyz 01-01-2020 00:15:00 21 41 xyz 01-01-2020 00:20:00 0 41 xyz 01-01-2020 00:25:00 0 41 xyz 01-01-2020 00:30:00 0 41 xyz 01-01-2020

Find difference between 2 columns with Nulls using pandas

﹥>﹥吖頭↗ 提交于 2021-02-10 05:27:51
问题 I want to find the difference between 2 columns of type int in a pandas DataFrame. I am using python 2.7. The columns are as below - >>> df INVOICED_QUANTITY QUANTITY_SHIPPED 0 15 NaN 1 20 NaN 2 7 NaN 3 7 NaN 4 7 NaN Now, I want to subtract QUANTITY_SHIPPED from INVOICED_QUANTITY & I do the below- >>> df['Diff'] = df['QUANTITY_INVOICED'] - df['SHIPPED_QUANTITY'] >>> df QUANTITY_INVOICED SHIPPED_QUANTITY Diff 0 15 NaN NaN 1 20 NaN NaN 2 7 NaN NaN 3 7 NaN NaN 4 7 NaN NaN How do I take care of

Import data from a text file into a pandas dataframe

百般思念 提交于 2021-02-10 05:27:06
问题 I'm building a web app using Django. I uploaded a text file using csv_file = request.FILES['file']. I can't read the csv into pandas. The file that i'm trying to import has text and data, but I only want the data. I've tried the following df = pd.read_csv(csv_file, sep=" ", header=None, names=["col1","col2","col3"], skiprows = 2) to try to remove the comments and just read the numbers Error: pandas will not read all 3 columns. It only reads 1 column I tried df = pd.read_csv(csv_file, sep="\s

How to select only first entity extracted from spacy entities?

允我心安 提交于 2021-02-10 05:22:10
问题 I am trying to using following code to extract entities from text available in DataFrame. for i in df['Text'].to_list(): doc = nlp(i) for entity in doc.ents: if entity.label_ == 'GPE': I need to store text of first GPE with it's corresponding column of text. Like for instance if following is text at index 0 in column df['Text'] Match between USA and Canada was postponed then I need only first location(USA) in another column such as df['Place'] at the corresponding index to Text which is 0. df

Multiply columns of a dataframe by getting the column names from a list

与世无争的帅哥 提交于 2021-02-10 05:21:05
问题 I have a dataframe in which I have categorical as well as numerical columns. data = [['A',"India",10,20,30,15,"Cochin"],['B',"India",10,20,30,40,"Chennai"],['C',"India",10,20,30,15,"Chennai"]] df = pd.DataFrame(data,columns=['Product','Country',"2016 Total","2017 Total","2018 Total","2019 Total","Region"]) Product Country 2016 Total 2017 Total 2018 Total 2019 Total Region 0 A India 10 20 30 15 Cochin 1 B India 10 20 30 40 Chennai 2 C India 10 20 30 15 Chennai I know what will be the names of

Sort values of a dataframe column based on positive and negative values?

余生长醉 提交于 2021-02-10 05:11:00
问题 I have df column consisting of +ve and -ve columns. A B 0 a 5 1 b -13 2 c 15 3 d -10 And is there a way to sort out +ve values ascending and -ve values descending A B 0 a 5 1 c 15 2 d -10 3 b -13 回答1: First filter both with boolean indexing, sorting by DataFrame.sort_values and last concat together: mask = df['B'].gt(0) df = pd.concat([df[mask].sort_values('B'), df[~mask].sort_values('B', ascending=False)], ignore_index=True) print (df) A B 0 a 5 1 c 15 2 d -10 3 b -13 来源: https:/

Sort values of a dataframe column based on positive and negative values?

自作多情 提交于 2021-02-10 05:08:02
问题 I have df column consisting of +ve and -ve columns. A B 0 a 5 1 b -13 2 c 15 3 d -10 And is there a way to sort out +ve values ascending and -ve values descending A B 0 a 5 1 c 15 2 d -10 3 b -13 回答1: First filter both with boolean indexing, sorting by DataFrame.sort_values and last concat together: mask = df['B'].gt(0) df = pd.concat([df[mask].sort_values('B'), df[~mask].sort_values('B', ascending=False)], ignore_index=True) print (df) A B 0 a 5 1 c 15 2 d -10 3 b -13 来源: https:/

Filter dataframe index on multiple conditions

安稳与你 提交于 2021-02-10 05:01:34
问题 In pandas.DataFrame.filter is there a way to use the parameters "like" or "regex" so they support an OR condition. for example: df.filter(like='bbi', axis=1) would filter on columns with bbi in their name, but how would I filter on columns containing 'bbi' OR 'abc' ? A few options that fail: df.filter(like='bbi' or 'abc', axis=1) df.filter(like=('bbi' or 'abc'), axis=1) 回答1: I would do the below: Setup: df=pd.DataFrame(np.random.randint(0,20,20).reshape(5,4), columns=['abcd','bcde','efgh',

Randomly selecting from Pandas groups with equal probability — unexpected behavior

这一生的挚爱 提交于 2021-02-10 04:57:02
问题 I have 12 unique groups that I am trying to randomly sample from, each with a different number of observations. I want to randomly sample from the entire population (dataframe) with each group having the same probability of being selected from. The simplest example of this would be a dataframe with 2 groups. groups probability 0 a 0.25 1 a 0.25 2 b 0.5 using np.random.choice(df['groups'], p=df['probability'], size=100) Each iteration will now have a 50% chance of selecting group a and a 50%

Adding columns after comparing values in 2 dataframes with different lengths

一笑奈何 提交于 2021-02-10 04:55:53
问题 I referenced another stackoverflow, but the value came out weird and I asked again. like compare 2 columns in different dataframes df1 Name date A 2019-01-24 A 2019-02-14 B 2018-05-12 B 2019-07-21 C 2016-04-24 C 2017-09-11 D 2020-11-24 df2 Name date2 value A 2019-01-24 123124 A 2019-02-14 675756 B 2018-05-11 624622 B 2019-07-20 894321 C 2016-04-23 321032190 C 2017-09-11 201389 I would like to compare the name and date of df1 and the name and date2 of df2, and if it matches, add value to the