apply | 易学教程

How do I add random `NA`s into a data frame

阅读更多关于 How do I add random `NA`s into a data frame

问题 I created a data frame with random values n <- 50 df <- data.frame(id = seq (1:n), age = sample(c(20:90), n, rep = TRUE), sex = sample(c("m", "f"), n, rep = TRUE, prob = c(0.55, 0.45)) ) and would like to introduce a few NA values to simulate real world data. I am trying to use apply but cannot get there. The line apply(subset(df,select=-id), 2, function(x) {x[sample(c(1:n),floor(n/10))]}) will retrieve random values alright, but apply(subset(df,select=-id), 2, function(x) {x[sample(c(1:n)

rbind data frames based on a common pattern in data frame name

阅读更多关于 rbind data frames based on a common pattern in data frame name

问题 Say I have multiple data frames which all have identical vector names and I'd like to cbind all which have a commmon pattern. So for these 3 data frames: df.1 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)), speed=runif(10)) df.2 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)), speed=runif(10)) df.3 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)), speed = runif(10)) I would like to rbind everything with the

rbind data frames based on a common pattern in data frame name

阅读更多关于 rbind data frames based on a common pattern in data frame name

Pandas Rolling Apply custom

阅读更多关于 Pandas Rolling Apply custom

问题 I have been following a similar answer here, but I have some questions when using sklearn and rolling apply. I am trying to create z-scores and do PCA with rolling apply, but I keep on getting 'only length-1 arrays can be converted to Python scalars' error. Following the previous example I create a dataframe from sklearn.preprocessing import StandardScaler import pandas as pd import numpy as np sc=StandardScaler() tmp=pd.DataFrame(np.random.randn(2000,2)/10000,index=pd.date_range('2001-01-01'

Difference between using a spread syntax (…) and push.apply, when dealing with arrays

阅读更多关于 Difference between using a spread syntax (…) and push.apply, when dealing with arrays

问题 I have two arrays, const pets = ["dog", "cat", "hamster"] const wishlist = ["bird", "snake"] I want to append wishlist to pets , which can be done using two methods, Method 1: pets.push.apply(pets,wishlist) Which results in: [ 'dog', 'cat', 'hamster', 'bird', 'snake' ] Method 2: pets.push(...wishlist) Which also results in: [ 'dog', 'cat', 'hamster', 'bird', 'snake' ] Is there is a difference between these two methods in terms of performance when I deal with larger data? 回答1: Both Function

Use Pandas groupby() + apply() with arguments

阅读更多关于 Use Pandas groupby() + apply() with arguments

问题 I would like to use df.groupby() in combination with apply() to apply a function to each row per group. I normally use the following code, which usually works (note, that this is without groupby() ): df.apply(myFunction, args=(arg1,)) With the groupby() I tried the following: df.groupby('columnName').apply(myFunction, args=(arg1,)) However, I get the following error: TypeError: myFunction() got an unexpected keyword argument 'args' Hence, my question is: How can I use groupby() and apply()

Find and replace missing values with row mean

阅读更多关于 Find and replace missing values with row mean

问题 I have a data frame with NAs and I want to replace the NAs with row means c1 = c(1,2,3,NA) c2 = c(3,1,NA,3) c3 = c(2,1,3,1) df = data.frame(c1,c2,c3) > df c1 c2 c3 1 1 3 2 2 2 1 1 3 3 NA 3 4 NA 3 1 so that > df c1 c2 c3 1 1 3 2 2 2 1 1 3 3 3 3 4 2 3 1 回答1: Very similar to @baptiste's answer > ind <- which(is.na(df), arr.ind=TRUE) > df[ind] <- rowMeans(df, na.rm = TRUE)[ind[,1]] 回答2: I think this works, df[which(is.na(df), arr.ind=TRUE)] <- rowMeans(df[!complete.cases(df), ], na.rm=TRUE) 回答3:

Dataframe create new column based on other columns

阅读更多关于 Dataframe create new column based on other columns

问题 I have a dataframe: df <- data.frame('a'=c(1,2,3,4,5), 'b'=c(1,20,3,4,50)) df a b 1 1 1 2 2 20 3 3 3 4 4 4 5 5 50 and I want to create a new column based on existing columns. Something like this: if (df[['a']] == df[['b']]) { df[['c']] <- df[['a']] + df[['b']] } else { df[['c']] <- df[['b']] - df[['a']] } The problem is that the if condition is checked only for the first row... If I create a function from the above if statement then I use apply() (or mapply() ...), it is the same. In Python

apply() is slow - how to make it faster or what are my alternatives?

阅读更多关于 apply() is slow - how to make it faster or what are my alternatives?

问题 I have a quite large data frame, about 10 millions of rows. It has columns x and y , and what I want is to compute hypot <- function(x) {sqrt(x[1]^2 + x[2]^2)} for each row. Using apply it would take a lot of time (about 5 minutes, interpolating from lower sizes) and memory. But it seems to be too much for me, so I've tried different things: compiling the hypot function reduces the time by about 10% using functions from plyr greatly increases the running time. What's the fastest way to do

Can I use apply() with constructor to pass arbitrary number of parameters

阅读更多关于 Can I use apply() with constructor to pass arbitrary number of parameters

问题 I've got a function wich can accept a varible number of parameter with a rest operator. I want create an object passing the argument collected with the rest operator directly to a constructor without create an object and call an initializing function and without passing the entire array but the parameters ah I do with apply() function. Is it possible ? Using apply doesn't work. public function myFunc(...arg) { // something link "new MyClass.apply(args)" return new MyClass(); } 回答1: