pandas

Calculating mean of a specific column by specific rows

南楼画角 提交于 2021-02-05 06:55:11
问题 I have a dataframe that looks like in the pictures. Now, I want to add a new column that will show the average of power for each day (given the data is sampled every 5 minutes), but separately for when it is day_or_night (day = 0 in the column, night = 1). I've gotten this far: train['avg_by_day'][train['day_or_night']==1] = train['power'][train['day_or_night']==1].mean() train['avg_by_day'][train['day_or_night']==0] = train['power'][train['day_or_night']==0].mean() but this just adds the

Single column with value counts from multiple column dataframe

梦想的初衷 提交于 2021-02-05 06:54:04
问题 I would like to sum the frequencies over multiple columns with pandas. The amount of columns can vary between 2-15 columns. Here is an example of just 3 columns: code1 code2 code3 27 5 56 534 27 78 27 312 55 89 312 27 And I would like to have the following result: code frequency 5 1 27 4 55 1 56 2 78 1 312 2 534 1 To count values inside one column is not the problem, just need a sum of all frequencies in a dataframe a value can appear, no matter the amount of columns. 回答1: You could stack and

Calculating mean of a specific column by specific rows

╄→гoц情女王★ 提交于 2021-02-05 06:52:29
问题 I have a dataframe that looks like in the pictures. Now, I want to add a new column that will show the average of power for each day (given the data is sampled every 5 minutes), but separately for when it is day_or_night (day = 0 in the column, night = 1). I've gotten this far: train['avg_by_day'][train['day_or_night']==1] = train['power'][train['day_or_night']==1].mean() train['avg_by_day'][train['day_or_night']==0] = train['power'][train['day_or_night']==0].mean() but this just adds the

Count occurrences of certain string in entire pandas dataframe

眉间皱痕 提交于 2021-02-05 06:50:32
问题 I have following dataframe in pandas C1 C2 C3 10 a b 10 a b ? c c ? ? b 10 a b 10 ? ? I want to count the occurrences of ? in all the columns My desired output is column wise sum of occurrences 回答1: Use: m=df.eq('?').sum() pd.DataFrame([m.values],columns=m.index) C1 C2 C3 0 2 2 1 Or better : df.eq('?').sum().to_frame().T #thanks @user3483203 C1 C2 C3 0 2 2 1 来源: https://stackoverflow.com/questions/54714183/count-occurrences-of-certain-string-in-entire-pandas-dataframe

Pandas Finding cross sell in two columns in a data frame

巧了我就是萌 提交于 2021-02-05 06:49:26
问题 What I'm trying to do is a kind of a cross sell. I have a Pandas dataframe with two columns, one with receipt numbers, and the other with product ids: receipt product 1 a 1 b 2 c 3 b 3 a Most of the receipts have many products. What I need to find is the count of combinations of products that happen in the receipts. Let's say products 'a' and 'b' are the most common combination (they appear together in most of the receipts), how do I find this information? I tried using df.groupby(['receipt',

Pandas groupby: divide last in group by first in group

99封情书 提交于 2021-02-05 06:49:07
问题 I have a dataframe that I have grouped by multiple columns. Within each group, I would like to then generate a value that finds the last entity of each of those groups and divide by the first entity. I would also like to show the number of entities and the last entity value in the output. See below for an example data and the desired output. I know how to show the count of the group, shown below in the code. df_group=df.groupby(['ID','Item','End_Date','Type']) df_output=df_group.size().reset

How to check for wrong datetime entries (python/pandas)?

杀马特。学长 韩版系。学妹 提交于 2021-02-05 06:47:06
问题 I have an excel dataset containing datetime values of worked hours entered by employees. Now that the end of the year is near they want to report on it, however it is full of wrong entries. Thus I need to clean it. Herebelow some examples of wrong entries. What would be your approach when facing such datasets? I first converted date column to datetime using df['Shiftdatum'] = pd.to_datetime(df.Shiftdatum, format='%Y-%m-%d', errors='coerce') In below's sampledata it shows a NaT How do I filter

Using str.split for pandas dataframe values based on parentheses location

此生再无相见时 提交于 2021-02-05 06:47:05
问题 Let's say I have the following dataframe series df['Name'] column: Name 'Jerry' 'Adam (and family)' 'Paul and Hellen (and family):\n' 'John and Peter (and family):/n' How would I remove all the contents in Name after the first parentheses? df['Name']= df['Name'].str.split("'(").str[0] doesn't seem to work and I don't understand why? The output I want is Name 'Jerry' 'Adam' 'Paul and Hellen' 'John and Peter' so everything after the parentheses is deleted. 回答1: Solution with split - is

Pandas: Fill missing values using last available

荒凉一梦 提交于 2021-02-05 06:45:06
问题 I have a dataframe as follows: A B zDate 01-JAN-17 100 200 02-JAN-17 111 203 03-JAN-17 NaN 202 04-JAN-17 109 205 05-JAN-17 101 211 06-JAN-17 105 NaN 07-JAN-17 104 NaN What is the best way, to fill the missing values, using last available ones? Following is the intended result: A B zDate 01-JAN-17 100 200 02-JAN-17 111 203 03-JAN-17 111 202 04-JAN-17 109 205 05-JAN-17 101 211 06-JAN-17 105 211 07-JAN-17 104 211 回答1: Use ffill function, what is same as fillna with method ffill : df = df.ffill()

select data based on datetime in pandas dataframe

余生长醉 提交于 2021-02-05 06:43:25
问题 I am trying to create some sort of "functional select" that gives users flexibility to create configuration to select data in pandas dataframes. However I ran into some issues that puzzle me. The following is a simplified example: >>> import pandas as pd >>> df = pd.DataFrame({'date': pd.date_range(start='2020-01-01', periods=4), 'val': [1, 2, 3, 4]}) >>> df date val 0 2020-01-01 1 1 2020-01-02 2 2 2020-01-03 3 3 2020-01-04 4 Question 1: Why do I get different result when I apply the function