apply | 易学教程

R: applying a function over a group

阅读更多关于 R: applying a function over a group

问题 I am looking to apply a function to a data frame and then store the results of that function in a new column in the data frame. Here is a sample of my data frame, tradeData: Login AL Diff a 1 0 a 1 0 a 1 0 a 0 1 a 0 0 a 0 0 a 0 0 a 1 -1 a 1 0 a 0 1 a 1 -1 a 1 0 a 0 1 b 1 0 b 0 1 b 0 0 b 0 0 b 1 -1 c 1 0 c 1 0 c 0 1 c 0 0 c 1 -1 Where the "Diff" column is the column I am trying to add. It just just the difference between the values row(x-1) and row(x) of tradeData, grouped by Login. Here are

Subtract one column from previous column

阅读更多关于 Subtract one column from previous column

问题 Sample data dfData <- data.frame(ID = c(1, 2, 3, 4, 5), DistA = c(10, 8, 15, 22, 15), DistB = c(15, 35, 40, 33, 20), DistC = c(20,40,50,45,30), DistD = c(60,55,55,48,50)) ID DistA DistB DistC DistD 1 1 10 15 20 60 2 2 8 35 40 55 3 3 15 40 50 55 4 4 22 33 45 48 5 5 15 20 30 50 I have some IDs for which there are four columns which measure cumulative distance. I want to create new column that gives the actual distance for each column i.e. subtract the next column from previous column. For e.g.

Pandas DataFrame apply function to multiple columns and output multiple columns

阅读更多关于 Pandas DataFrame apply function to multiple columns and output multiple columns

问题 I have been scouring SO for the best way of applying a function that takes multiple separate Pandas DataFrame columns and outputs multiple new columns in the same said DataFrame. Let's say I have the following: def apply_func_to_df(df): df[['new_A', 'new_B']] = df.apply(lambda x: transform_func(x['A'], x['B'], x['C']), axis=1) def transform_func(value_A, value_B, value_C): # do some processing and transformation and stuff return new_value_A, new_value_B I am trying to apply this function as

Pandas DataFrame apply function to multiple columns and output multiple columns

阅读更多关于 Pandas DataFrame apply function to multiple columns and output multiple columns

Pandas DataFrame apply function to multiple columns and output multiple columns

阅读更多关于 Pandas DataFrame apply function to multiple columns and output multiple columns

R - indices of matching values of two data.tables

阅读更多关于 R - indices of matching values of two data.tables

问题 This is my first post at StackOverflow. I am relatively a newbie in programming and trying to work with the data.table in R, for its reputation in speed. I have a very large data.table, named "Actions", with 5 columns and potentially several million rows. The column names are k1, k2, i, l1 and l2. I have another data.table, with the unique values of Actions in columns k1 and k2, named "States". For every row in Actions, I would like to find the unique index for columns 4 and 5, matching with

How can I generate by-group summary statistics if my grouping variable is a factor?

阅读更多关于 How can I generate by-group summary statistics if my grouping variable is a factor?

问题 Suppose I wanted to get some summary statistics on the dataset mtcars (part of base R version 2.12.1). Below, I group the cars according to the number of engine cylinders they have and take the per-group means of the remaining variables in mtcars . > str(mtcars) 'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat:

Replace values in DataFrame column when they start with string using lambda

阅读更多关于 Replace values in DataFrame column when they start with string using lambda

问题 I have a DataFrame: import pandas as pd import numpy as np x = {'Value': ['Test', 'XXX123', 'XXX456', 'Test']} df = pd.DataFrame(x) I want to replace the values starting with XXX with np.nan using lambda. I have tried many things with replace, apply and map and the best I have been able to do is False, True, True, False. The below works, but I would like to know a better way to do it and I think the apply, replace and a lambda is probably a better way to do it. df.Value.loc[df.Value.str

Replace values in DataFrame column when they start with string using lambda

阅读更多关于 Replace values in DataFrame column when they start with string using lambda

Replace values in DataFrame column when they start with string using lambda

阅读更多关于 Replace values in DataFrame column when they start with string using lambda