apply

R: applying a function over a group

末鹿安然 提交于 2021-01-27 21:00:48
问题 I am looking to apply a function to a data frame and then store the results of that function in a new column in the data frame. Here is a sample of my data frame, tradeData: Login AL Diff a 1 0 a 1 0 a 1 0 a 0 1 a 0 0 a 0 0 a 0 0 a 1 -1 a 1 0 a 0 1 a 1 -1 a 1 0 a 0 1 b 1 0 b 0 1 b 0 0 b 0 0 b 1 -1 c 1 0 c 1 0 c 0 1 c 0 0 c 1 -1 Where the "Diff" column is the column I am trying to add. It just just the difference between the values row(x-1) and row(x) of tradeData, grouped by Login. Here are

Subtract one column from previous column

百般思念 提交于 2021-01-27 19:02:20
问题 Sample data dfData <- data.frame(ID = c(1, 2, 3, 4, 5), DistA = c(10, 8, 15, 22, 15), DistB = c(15, 35, 40, 33, 20), DistC = c(20,40,50,45,30), DistD = c(60,55,55,48,50)) ID DistA DistB DistC DistD 1 1 10 15 20 60 2 2 8 35 40 55 3 3 15 40 50 55 4 4 22 33 45 48 5 5 15 20 30 50 I have some IDs for which there are four columns which measure cumulative distance. I want to create new column that gives the actual distance for each column i.e. subtract the next column from previous column. For e.g.

Pandas DataFrame apply function to multiple columns and output multiple columns

此生再无相见时 提交于 2021-01-27 11:27:28
问题 I have been scouring SO for the best way of applying a function that takes multiple separate Pandas DataFrame columns and outputs multiple new columns in the same said DataFrame. Let's say I have the following: def apply_func_to_df(df): df[['new_A', 'new_B']] = df.apply(lambda x: transform_func(x['A'], x['B'], x['C']), axis=1) def transform_func(value_A, value_B, value_C): # do some processing and transformation and stuff return new_value_A, new_value_B I am trying to apply this function as

Pandas DataFrame apply function to multiple columns and output multiple columns

久未见 提交于 2021-01-27 11:26:29
问题 I have been scouring SO for the best way of applying a function that takes multiple separate Pandas DataFrame columns and outputs multiple new columns in the same said DataFrame. Let's say I have the following: def apply_func_to_df(df): df[['new_A', 'new_B']] = df.apply(lambda x: transform_func(x['A'], x['B'], x['C']), axis=1) def transform_func(value_A, value_B, value_C): # do some processing and transformation and stuff return new_value_A, new_value_B I am trying to apply this function as

Pandas DataFrame apply function to multiple columns and output multiple columns

雨燕双飞 提交于 2021-01-27 11:25:47
问题 I have been scouring SO for the best way of applying a function that takes multiple separate Pandas DataFrame columns and outputs multiple new columns in the same said DataFrame. Let's say I have the following: def apply_func_to_df(df): df[['new_A', 'new_B']] = df.apply(lambda x: transform_func(x['A'], x['B'], x['C']), axis=1) def transform_func(value_A, value_B, value_C): # do some processing and transformation and stuff return new_value_A, new_value_B I am trying to apply this function as

R - indices of matching values of two data.tables

孤人 提交于 2021-01-27 10:40:29
问题 This is my first post at StackOverflow. I am relatively a newbie in programming and trying to work with the data.table in R, for its reputation in speed. I have a very large data.table, named "Actions", with 5 columns and potentially several million rows. The column names are k1, k2, i, l1 and l2. I have another data.table, with the unique values of Actions in columns k1 and k2, named "States". For every row in Actions, I would like to find the unique index for columns 4 and 5, matching with

How can I generate by-group summary statistics if my grouping variable is a factor?

老子叫甜甜 提交于 2021-01-27 05:16:27
问题 Suppose I wanted to get some summary statistics on the dataset mtcars (part of base R version 2.12.1). Below, I group the cars according to the number of engine cylinders they have and take the per-group means of the remaining variables in mtcars . > str(mtcars) 'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat:

Replace values in DataFrame column when they start with string using lambda

不羁的心 提交于 2021-01-04 04:22:50
问题 I have a DataFrame: import pandas as pd import numpy as np x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']} df = pd.DataFrame(x) I want to replace the values starting with XXX with np.nan using lambda. I have tried many things with replace, apply and map and the best I have been able to do is False, True, True, False. The below works, but I would like to know a better way to do it and I think the apply, replace and a lambda is probably a better way to do it. df.Value.loc[df.Value.str

Replace values in DataFrame column when they start with string using lambda

為{幸葍}努か 提交于 2021-01-04 04:16:17
问题 I have a DataFrame: import pandas as pd import numpy as np x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']} df = pd.DataFrame(x) I want to replace the values starting with XXX with np.nan using lambda. I have tried many things with replace, apply and map and the best I have been able to do is False, True, True, False. The below works, but I would like to know a better way to do it and I think the apply, replace and a lambda is probably a better way to do it. df.Value.loc[df.Value.str

Replace values in DataFrame column when they start with string using lambda

人盡茶涼 提交于 2021-01-04 04:14:01
问题 I have a DataFrame: import pandas as pd import numpy as np x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']} df = pd.DataFrame(x) I want to replace the values starting with XXX with np.nan using lambda. I have tried many things with replace, apply and map and the best I have been able to do is False, True, True, False. The below works, but I would like to know a better way to do it and I think the apply, replace and a lambda is probably a better way to do it. df.Value.loc[df.Value.str