na | 易学教程

R Filling missing values with NA for a data frame

阅读更多关于 R Filling missing values with NA for a data frame

问题 I am currently trying to create a data-frame with the following lists location <- list("USA","Singapore","UK") organization <- list("Microsoft","University of London","Boeing","Apple") person <- list() date <- list("1989","2001","2018") Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director") When I try and create a data-frame I get the (obvious) error that the lengths of the lists are not equal. I want to find a way to either make the lists the same length, or fill the

How to finish code to replace NA with median in R

阅读更多关于 How to finish code to replace NA with median in R

问题 I am very new to R, so please please be gentle. I am working on the Kaggle Titanic competition, to get me into R and working things out. I am working my way through engineering a feature and I am a bit stuck with the logic of what to do next. So, here goes. My goal is to take the Age data and replace all of the NA with the median of age for the title of the person. e.g. if the person is a master, I want to get the median of all the masters and replace the NA with that median. Same for Mr. and

R - Get number of values per group without counting NAs

阅读更多关于 R - Get number of values per group without counting NAs

问题 So I'm trying to count the number of values per group in a column without counting the NAs. I've tried doing it with "length" but I can't figure out how to tell "length" to leave the NAs be, when in the context of looking at values per group. I've found similar problems but couldn't figure out how to apply the solutions to my case: Length of columns excluding NA in r http://r.789695.n4.nabble.com/Length-of-vector-without-NA-s-td2552208.html I've created a minimal working example to illustrate

Error in fread{data.table} becasue its not reading NAs correctly/as I want it to

阅读更多关于 Error in fread{data.table} becasue its not reading NAs correctly/as I want it to

问题 This might be a beginners question and have quite a simple fix but I've been at it for a while and cant seem to figure it out. I have highfrequency data that has about 500,000 rows and 62 columns. I want to use fread() to make the reading more efficient but the problem is that not all rows are of the same length. Theres quote data and theres trade data, the trade lines have only 5 columns. Here is my output when I read using read.csv: > df<- read.csv(file = "AUROPHARMA15OCTFUT_20150916_ob.csv

Convert NA to most appearing value based in another column

阅读更多关于 Convert NA to most appearing value based in another column

问题 I have a data frame called df like this: Author_ID Country Cited Name Title 1: 1 Spain 10 Alex Whatever 2: 1 France 15 Ale Whatever2 3: 1 NA 10 Alex Whatever3 4: 1 Spain 10 Alex Whatever4 5: 2 Italy 10 Alice Whatever5 6: 2 Greece 10 Alice Whatever6 7: 2 Greece 10 Alice Whatever7 8: 2 NA 10 Alce Whatever8 8: 2 NA 10 Alce Whatever8 And I would like to get something like this, where the NA are replaced for the Country that most times appears for that Author_ID (if there are two countries that

Select rows from pandas data frame where specified columns are not all NaN

阅读更多关于 Select rows from pandas data frame where specified columns are not all NaN

问题 I have a Pandas DataFrame object data with columns 'a', 'b', 'c', ..., 'z' I want to select all rows which satisfy the following condition: data in columns 'b' , 'c' , 'g' is not NaN simultaneously. I tried: new_data = data[not all(np.isnan(value) for value in data[['b', 'c', 'g']])] but it didn't work - throws an error: Traceback (most recent call last): File "<input>", line 1, in <module>` File "<input>", line 1, in <genexpr> TypeError: Not implemented for this type 回答1: I want to select

Something weird about returning NAs

阅读更多关于 Something weird about returning NAs

问题 this is a lame question I guess, but i don't understand what's is happining. If I go: sum(is.na(census$wd)) It returns 4205 But if I go with: sum(census$wd == NA) It returns "NA" I just would like to understand whats is happening. If I do str(census), wd shows up as: $ wd : num NA 0.65 0.65 0.65 0.78 0.78 0.78 0.78 0.78 0.78 ... Can anyone explains why the codes return different outputs? Thank you! 回答1: == in R is a comparison. But you can not compare something to NA in R as the following

Calculate metrics for multiple columns based on subsets defined by other columns

阅读更多关于 Calculate metrics for multiple columns based on subsets defined by other columns

问题 I would like to calculate simple summary metrics for subsets of certain columns in a data frame, where the subsets are based on information in other columns of the same data frame. Let me illustrate: colA <- c(NA,2,3,NA,NA,3,9,5,6,1) colB <- c(9,3,NA,2,2,4,6,1,9,9) colC <- c(NA,NA,5,7,3,9,8,1,2,3) colAA <- c(NA,NA,6,NA,NA,NA,1,7,9,4) colBB <- c(NA,2,NA,7,8,NA,2,7,9,4) colCC <- c(NA,NA,3,7,5,8,9,9,NA,3) df <- data.frame(colA,colB,colC,colAA,colBB,colCC) > df colA colB colC colAA colBB colCC 1

Can you use rbind.fill without having it fill in NA's?

阅读更多关于 Can you use rbind.fill without having it fill in NA's?

问题 I am trying to combine two dataframes with different number of columns and column headers. However, after I combine them using rbind.fill() , the resulting file has filled the empty cells with NA . This is very inconvenient since one of the columns has data that is also represented as "NA" (for North America), so when I import it into a csv, the spreadsheet can't tell them apart. Is there a way for me to: Use the rbind.fill function without having it populate the empty cells with NA or Change

R- Random forest predict fails with NAs in predictors

阅读更多关于 R- Random forest predict fails with NAs in predictors

问题 The documentation (If I'm reading it correctly) says that the random forest predict function produces NA predictions if it encounters NA predictors for certain observations. NOTE: If the object inherits from randomForest.formula, then any data with NA are silently omitted from the prediction. The returned value will contain NA correspondingly in the aggregated and individual tree predictions (if requested), but not in the proximity or node matrices However, if I try to use the predict