na | 易学教程

How to change na.action for zero-inflated regression model?

阅读更多关于 How to change na.action for zero-inflated regression model?

问题 I am running a zero-inflated negative binomial regression model using the function zeroinfl from the pscl package. I need to exclude NA's from the model in order to be able to plot the residuals against the dependent variable later in the analysis. Therefore, I want to set na.action="na.exclude" . I can do this without any problem for a non-zero-inflated negative binomial regression model (using glm.nb from the glm package), eg. fm_nbin <- glm.nb(DV ~ factor(IDV) + contr1 +contr2 + contr3,

Looping through data to set values > or < variable as NA in R

阅读更多关于 Looping through data to set values > or < variable as NA in R

问题 I have a data frame containing columns with integers, characters, and numerics. The actual data set is much larger than the example given below, but what is below is a passable and much smaller imitation. I am trying to loop through the data and change any values greater than the mean + (3 * standard deviation) and less than the mean - (3 * standard deviation) to NA in the numeric columns only . If a column contains an integer or character, the loop should skip it and continue onto the next

dplyr arrange() function sort by missing values

阅读更多关于 dplyr arrange() function sort by missing values

问题 I am attempting to work through Hadley Wickham's R for Data Science and have gotten tripped up on the following question: "How could you use arrange() to sort all missing values to the start? (Hint: use is.na())" I am using the flights dataset included in the nycflights13 package. Given that arrange() sorts all unknown values to the bottom of the dataframe, I am not sure how one would do the opposite across the missing values of all variables. I realize that this question can be answered with

dplyr arrange() function sort by missing values

阅读更多关于 dplyr arrange() function sort by missing values

Replace NA values with median by group

阅读更多关于 Replace NA values with median by group

问题 I have used the below tapply function to get the median of Age based on Pclass. Now how can I impute those median values to NA values based on Pclass? tapply(titan_train$Age, titan_train$Pclass, median, na.rm=T) 回答1: Here is another base R approach that uses replace and ave . df1 <- transform(df1, Age = ave(Age, Pclass, FUN = function(x) replace(x, is.na(x), median(x, na.rm = T)))) df1 # Pclass Age # 1 A 1 # 2 A 2 # 3 A 3 # 4 B 4 # 5 B 5 # 6 B 6 # 7 C 7 # 8 C 8 # 9 C 9 Same idea but using

How to conditionally replace values with NA across multiple columns

阅读更多关于 How to conditionally replace values with NA across multiple columns

问题 I would like to replace outliers in each column of a dataframe with NA. If for example we define outliers as being any value greater than 3 standard deviations from the mean I can achieve this per variable with the code below. Rather than specify each column individually I'd like to perform the same operation on all columns of df in one call. Any pointers on how to do this?! Thanks! library(dplyr) data("iris") df <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length)%>% head(10) # add a

How to remove NA from facet_wrap in ggplot2?

阅读更多关于 How to remove NA from facet_wrap in ggplot2?

问题 I am trying to use facet_wrap to make a polygon map in ggplot2. I have two factor levels (soybean, Maize) in my variable "crop" However, I am getting three plots: soybean, maize and one with NA values. In addition NA values are not displayed in the first two facets- here is my code to make the map: ggplot(study_area.map, aes(x=long, y=lat, group=group)) + geom_polygon(aes(fill=brazil_loss_new2)) + geom_path(colour="black") + facet_wrap(~crop, ncol=2, drop=T) + scale_fill_brewer(na.value="grey

How to remove NA from facet_wrap in ggplot2?

阅读更多关于 How to remove NA from facet_wrap in ggplot2?

pandas dataframe concat is giving unwanted NA/NaN columns

阅读更多关于 pandas dataframe concat is giving unwanted NA/NaN columns

问题 Instead of this example where it is horizontal After Pandas Dataframe pd.concat I get NaNs, I'm trying vertical: import pandas a=[['Date', 'letters', 'numbers', 'mixed'], ['1/2/2014', 'a', '6', 'z1'], ['1/2/2014', 'a', '3', 'z1'], ['1/3/2014', 'c', '1', 'x3']] df = pandas.DataFrame.from_records(a[1:],columns=a[0]) f=[] for i in range(0,len(df)): f.append(df['Date'][i] + ' ' + df['letters'][i]) df['new']=f c=[x for x in range(0,5)] b=[] b += [['NA'] * (5 - len(b))] df_a = pandas.DataFrame.from

Python : reducing memory usage of small integers with missing values

阅读更多关于 Python : reducing memory usage of small integers with missing values

问题 I am in the process of reducing the memory usage of my code. The goal of this code is handling some big dataset. Those are stored in Pandas dataframe if that is relevant. Among many other data there are some small integers. As they contain some missing values (NA) Python has them set to the float64 type by default. I was trying to downcast them to some smaller int format (int8 or int16 for exemple), but I got an error because of the NA. It seems that there are some new integer type (Int64)