imputation

Impute categorical missing values in scikit-learn

谁都会走 提交于 2019-11-26 11:47:35
问题 I\'ve got pandas data with some columns of text type. There are some NaN values along with these text columns. What I\'m trying to do is to impute those NaN\'s by sklearn.preprocessing.Imputer (replacing NaN by the most frequent value). The problem is in implementation. Suppose there is a Pandas dataframe df with 30 columns, 10 of which are of categorical nature. Once I run: from sklearn.preprocessing import Imputer imp = Imputer(missing_values=\'NaN\', strategy=\'most_frequent\', axis=0) imp

Replace missing values with mean - Spark Dataframe

孤者浪人 提交于 2019-11-26 11:27:45
问题 I have a Spark Dataframe with some missing values. I would like to perform a simple imputation by replacing the missing values with the mean for that column. I am very new to Spark, so I have been struggling to implement this logic. This is what I have managed to do so far: a) To do this for a single column (let\'s say Col A), this line of code seems to work: df.withColumn(\"new_Col\", when($\"ColA\".isNull, df.select(mean(\"ColA\")) .first()(0).asInstanceOf[Double]) .otherwise($\"ColA\")) b)

Replace missing values with column mean

女生的网名这么多〃 提交于 2019-11-26 04:42:22
问题 I am not sure how to loop over each column to replace the NA values with the column mean. When I am trying to replace for one column using the following, it works well. Column1[is.na(Column1)] <- round(mean(Column1, na.rm = TRUE)) The code for looping over columns is not working: for(i in 1:ncol(data)){ data[i][is.na(data[i])] <- round(mean(data[i], na.rm = TRUE)) } the values are not replaced. Can someone please help me with this? 回答1: A relatively simple modification of your code should

How do I replace NA values with zeros in an R dataframe?

徘徊边缘 提交于 2019-11-26 01:19:06
问题 I have a data frame and some columns have NA values. How do I replace these NA values with zeroes? 回答1: See my comment in @gsk3 answer. A simple example: > m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10) > d <- as.data.frame(m) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 4 3 NA 3 7 6 6 10 6 5 2 9 8 9 5 10 NA 2 1 7 2 3 1 1 6 3 6 NA 1 4 1 6 4 NA 4 NA 7 10 2 NA 4 1 8 5 1 2 4 NA 2 6 2 6 7 4 6 NA 3 NA NA 10 2 1 10 8 4 7 4 4 9 10 9 8 9 4 10 NA 8 5 8 3 2 1 4 5 9 4 7 9 3 9 10 1 9 9 10 5 3 3 10 4 2 2 5