na

R Filling missing values with NA for a data frame

做~自己de王妃 提交于 2019-12-11 16:48:42
问题 I am currently trying to create a data-frame with the following lists location <- list("USA","Singapore","UK") organization <- list("Microsoft","University of London","Boeing","Apple") person <- list() date <- list("1989","2001","2018") Jobs <- list("CEO","Chairman","VP of sales","General Manager","Director") When I try and create a data-frame I get the (obvious) error that the lengths of the lists are not equal. I want to find a way to either make the lists the same length, or fill the

How to finish code to replace NA with median in R

a 夏天 提交于 2019-12-11 15:56:28
问题 I am very new to R, so please please be gentle. I am working on the Kaggle Titanic competition, to get me into R and working things out. I am working my way through engineering a feature and I am a bit stuck with the logic of what to do next. So, here goes. My goal is to take the Age data and replace all of the NA with the median of age for the title of the person. e.g. if the person is a master, I want to get the median of all the masters and replace the NA with that median. Same for Mr. and

R - Get number of values per group without counting NAs

烈酒焚心 提交于 2019-12-11 10:54:23
问题 So I'm trying to count the number of values per group in a column without counting the NAs. I've tried doing it with "length" but I can't figure out how to tell "length" to leave the NAs be, when in the context of looking at values per group. I've found similar problems but couldn't figure out how to apply the solutions to my case: Length of columns excluding NA in r http://r.789695.n4.nabble.com/Length-of-vector-without-NA-s-td2552208.html I've created a minimal working example to illustrate

Error in fread{data.table} becasue its not reading NAs correctly/as I want it to

為{幸葍}努か 提交于 2019-12-11 10:34:29
问题 This might be a beginners question and have quite a simple fix but I've been at it for a while and cant seem to figure it out. I have highfrequency data that has about 500,000 rows and 62 columns. I want to use fread() to make the reading more efficient but the problem is that not all rows are of the same length. Theres quote data and theres trade data, the trade lines have only 5 columns. Here is my output when I read using read.csv: > df<- read.csv(file = "AUROPHARMA15OCTFUT_20150916_ob.csv

Convert NA to most appearing value based in another column

痞子三分冷 提交于 2019-12-11 10:32:21
问题 I have a data frame called df like this: Author_ID Country Cited Name Title 1: 1 Spain 10 Alex Whatever 2: 1 France 15 Ale Whatever2 3: 1 NA 10 Alex Whatever3 4: 1 Spain 10 Alex Whatever4 5: 2 Italy 10 Alice Whatever5 6: 2 Greece 10 Alice Whatever6 7: 2 Greece 10 Alice Whatever7 8: 2 NA 10 Alce Whatever8 8: 2 NA 10 Alce Whatever8 And I would like to get something like this, where the NA are replaced for the Country that most times appears for that Author_ID (if there are two countries that

Select rows from pandas data frame where specified columns are not all NaN

强颜欢笑 提交于 2019-12-11 10:09:18
问题 I have a Pandas DataFrame object data with columns 'a', 'b', 'c', ..., 'z' I want to select all rows which satisfy the following condition: data in columns 'b' , 'c' , 'g' is not NaN simultaneously. I tried: new_data = data[not all(np.isnan(value) for value in data[['b', 'c', 'g']])] but it didn't work - throws an error: Traceback (most recent call last): File "<input>", line 1, in <module>` File "<input>", line 1, in <genexpr> TypeError: Not implemented for this type 回答1: I want to select

Something weird about returning NAs

浪子不回头ぞ 提交于 2019-12-11 09:45:15
问题 this is a lame question I guess, but i don't understand what's is happining. If I go: sum(is.na(census$wd)) It returns 4205 But if I go with: sum(census$wd == NA) It returns "NA" I just would like to understand whats is happening. If I do str(census), wd shows up as: $ wd : num NA 0.65 0.65 0.65 0.78 0.78 0.78 0.78 0.78 0.78 ... Can anyone explains why the codes return different outputs? Thank you! 回答1: == in R is a comparison. But you can not compare something to NA in R as the following

Calculate metrics for multiple columns based on subsets defined by other columns

谁都会走 提交于 2019-12-11 08:58:39
问题 I would like to calculate simple summary metrics for subsets of certain columns in a data frame, where the subsets are based on information in other columns of the same data frame. Let me illustrate: colA <- c(NA,2,3,NA,NA,3,9,5,6,1) colB <- c(9,3,NA,2,2,4,6,1,9,9) colC <- c(NA,NA,5,7,3,9,8,1,2,3) colAA <- c(NA,NA,6,NA,NA,NA,1,7,9,4) colBB <- c(NA,2,NA,7,8,NA,2,7,9,4) colCC <- c(NA,NA,3,7,5,8,9,9,NA,3) df <- data.frame(colA,colB,colC,colAA,colBB,colCC) > df colA colB colC colAA colBB colCC 1

Can you use rbind.fill without having it fill in NA's?

时光怂恿深爱的人放手 提交于 2019-12-11 07:39:06
问题 I am trying to combine two dataframes with different number of columns and column headers. However, after I combine them using rbind.fill() , the resulting file has filled the empty cells with NA . This is very inconvenient since one of the columns has data that is also represented as "NA" (for North America), so when I import it into a csv, the spreadsheet can't tell them apart. Is there a way for me to: Use the rbind.fill function without having it populate the empty cells with NA or Change

R- Random forest predict fails with NAs in predictors

狂风中的少年 提交于 2019-12-11 07:16:03
问题 The documentation (If I'm reading it correctly) says that the random forest predict function produces NA predictions if it encounters NA predictors for certain observations. NOTE: If the object inherits from randomForest.formula, then any data with NA are silently omitted from the prediction. The returned value will contain NA correspondingly in the aggregated and individual tree predictions (if requested), but not in the proximity or node matrices However, if I try to use the predict