imputation | 易学教程

Impute categorical missing values in scikit-learn

阅读更多关于 Impute categorical missing values in scikit-learn

问题 I\'ve got pandas data with some columns of text type. There are some NaN values along with these text columns. What I\'m trying to do is to impute those NaN\'s by sklearn.preprocessing.Imputer (replacing NaN by the most frequent value). The problem is in implementation. Suppose there is a Pandas dataframe df with 30 columns, 10 of which are of categorical nature. Once I run: from sklearn.preprocessing import Imputer imp = Imputer(missing_values=\'NaN\', strategy=\'most_frequent\', axis=0) imp

Replace missing values with mean - Spark Dataframe

阅读更多关于 Replace missing values with mean - Spark Dataframe

问题 I have a Spark Dataframe with some missing values. I would like to perform a simple imputation by replacing the missing values with the mean for that column. I am very new to Spark, so I have been struggling to implement this logic. This is what I have managed to do so far: a) To do this for a single column (let\'s say Col A), this line of code seems to work: df.withColumn(\"new_Col\", when($\"ColA\".isNull, df.select(mean(\"ColA\")) .first()(0).asInstanceOf[Double]) .otherwise($\"ColA\")) b)

Replace missing values with column mean

阅读更多关于 Replace missing values with column mean

问题 I am not sure how to loop over each column to replace the NA values with the column mean. When I am trying to replace for one column using the following, it works well. Column1[is.na(Column1)] <- round(mean(Column1, na.rm = TRUE)) The code for looping over columns is not working: for(i in 1:ncol(data)){ data[i][is.na(data[i])] <- round(mean(data[i], na.rm = TRUE)) } the values are not replaced. Can someone please help me with this? 回答1: A relatively simple modification of your code should

How do I replace NA values with zeros in an R dataframe?

阅读更多关于 How do I replace NA values with zeros in an R dataframe?

问题 I have a data frame and some columns have NA values. How do I replace these NA values with zeroes? 回答1: See my comment in @gsk3 answer. A simple example: > m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10) > d <- as.data.frame(m) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 4 3 NA 3 7 6 6 10 6 5 2 9 8 9 5 10 NA 2 1 7 2 3 1 1 6 3 6 NA 1 4 1 6 4 NA 4 NA 7 10 2 NA 4 1 8 5 1 2 4 NA 2 6 2 6 7 4 6 NA 3 NA NA 10 2 1 10 8 4 7 4 4 9 10 9 8 9 4 10 NA 8 5 8 3 2 1 4 5 9 4 7 9 3 9 10 1 9 9 10 5 3 3 10 4 2 2 5