apply

Split, apply, and combine multiple data frames into one data frame

半城伤御伤魂 提交于 2021-02-07 20:44:42
问题 I have completed an origin-destination cost matrix (23 origins, ~600,000 destinations) for traveling through a street network in ArcGIS and disaggregated the resulting matrix into DBF tables by store ID using a Python script. I have loaded each DBF table into an R session as follows: # Import OD cost matrix results for each store origins <- read.dbf('ODM_origins.dbf') store_17318 <- read.dbf('table_17318.dbf') store_17358 <- read.dbf('table_17358.dbf') store_17601 <- read.dbf('table_17601.dbf

apply function with constant parameter to pandas dataframe

对着背影说爱祢 提交于 2021-02-07 13:36:44
问题 I have a pandas dataframe, and I created a function. I would like to apply this function to each row of the dataframe. However the function has a third parameter that does not come from the dataframe and is constant so to say. import pandas as pd df = pd.DataFrame(data = {'a':[1, 2, 3], 'b':[4, 5, 6]}) def add(a, b, c): return a + b * c df['c'] = add(df['a'], df['b'], 2) I think I have to use the apply function but I don't see how I would pass this constant argument. print df >> a b c >> 0 1

Fast R implementation of an Exponentially Weighted Moving Average?

佐手、 提交于 2021-02-06 14:05:01
问题 I'd like to perform an exponentially weighted moving average (with parameterization defined here) on a vector in R. Is there a better implementation than my first attempt below? My first attempt was: ewma <- function(x, a) { n <- length(x) s <- rep(NA,n) s[1] <- x[1] if (n > 1) { for (i in 2:n) { s[i] <- a * x[i] + (1 - a) * s[i-1] } } return(s) } y <- 1:1e7 system.time(s <- ewma(y,0.5)) #user system elapsed # 2.48 0.00 2.50 In my second attempt, I thought I could do better by vectorizing:

Apply different functions to different items in group object: Python pandas

半城伤御伤魂 提交于 2021-02-05 20:30:31
问题 Suppose I have a dataframe as follows: In [1]: test_dup_df Out[1]: exe_price exe_vol flag 2008-03-13 14:41:07 84.5 200 yes 2008-03-13 14:41:37 85.0 10000 yes 2008-03-13 14:41:38 84.5 69700 yes 2008-03-13 14:41:39 84.5 1200 yes 2008-03-13 14:42:00 84.5 1000 yes 2008-03-13 14:42:08 84.5 300 yes 2008-03-13 14:42:10 84.5 88100 yes 2008-03-13 14:42:10 84.5 11900 yes 2008-03-13 14:42:15 84.5 5000 yes 2008-03-13 14:42:16 84.5 3200 yes I want to group a duplicate data at time 14:42:10 and apply

R substr function on multiple columns

懵懂的女人 提交于 2021-02-05 11:57:23
问题 I have 3 columns. First column has unique ID, second and third columns have string data and some NA data. I need to extract info from column 2 and put it in separate columns and do the same thing for column 3. I am building a function as follows, using for loops. I need to split the columns after the third letter. [For example in the V1 column below, I need to break AAAbbb as AAA and bbb and put them in separate columns. I know I can use substr to do this. I am new to R, please help. UID * V1

R substr function on multiple columns

こ雲淡風輕ζ 提交于 2021-02-05 11:57:21
问题 I have 3 columns. First column has unique ID, second and third columns have string data and some NA data. I need to extract info from column 2 and put it in separate columns and do the same thing for column 3. I am building a function as follows, using for loops. I need to split the columns after the third letter. [For example in the V1 column below, I need to break AAAbbb as AAA and bbb and put them in separate columns. I know I can use substr to do this. I am new to R, please help. UID * V1

Add column to data frame based on long list and values in another column is too slow

半城伤御伤魂 提交于 2021-02-05 11:39:25
问题 I am adding a new column to a dataframe using apply() and mutate. It works. Unfortunately, it is very slow. I have 24M rows and I am adding column based on values in a long (58 items). It was bearable with smaller list. Not anymore. Here is my example large_df <-data.frame(A=(1:4), B= c('a','b','c','d'), C= c('e','f','g','h')) long_list = c('e','f','g') large_df =mutate (large_df, new_C = apply(large_df[,2:3], 1, function(r) any(r %in% long_list))) The new column (new_C) will read True or

Function application for curried functions in JavaScript and ES6

╄→гoц情女王★ 提交于 2021-02-05 08:35:24
问题 I love that ECMAScript 6 allows you to write curried functions like this: var add = x => y => z => x + y + z; However, I hate that we need to parenthesize every argument of a curried function: add(2)(3)(5); I want to be able to apply curried functions to multiple arguments at once: add(2, 3, 5); What should I do? I don't care about performance. 回答1: Currying and the application of curried functions are controversial issues in Javascript. In simple terms, there are two opposing views, which I

R: converting fractions into decimals in a data frame

萝らか妹 提交于 2021-02-05 07:14:45
问题 I am trying to convert a data frame of numbers stored as characters in a fraction form to be stored as numbers in decimal form. (There are also some integers, also stored as char.) I want to keep the current structure of the data frame, i.e. I do not want a list as a result. Example data frame (note: the real data frame has all elements as character, here it is a factor but I couldn't figure out how to replicate a data frame with characters): a <- c("1","1/2","2") b <- c("5/2","3","7/2") c <-

Select rows around a marker R [duplicate]

自闭症网瘾萝莉.ら 提交于 2021-01-29 21:30:52
问题 This question already has answers here : Select N rows above and below match (3 answers) Closed 1 year ago . I'm trying to select 100 rows before and after a marker in a relatively large dataframe. The markers are sparse and for some reason I haven't been able to figure it out or find a solution - this doesn't seem like it should be that hard, so I'm probably missing something obvious. Here's a very small simple example of what the data looks like: timestamp talking_yn transition_yn 0.01 n n