missing-data

LOCF and NOCF methods for missing data: how to plot data?

淺唱寂寞╮ 提交于 2021-02-20 03:50:21
问题 I'm working on the following dataset and its missing data: # A tibble: 27 x 6 id sex d8 d10 d12 d14 <dbl> <chr> <dbl> <dbl> <dbl> <dbl> 1 1 F 21 20 21.5 23 2 2 F 21 21.5 24 25.5 3 3 NA NA 24 NA 26 4 4 F 23.5 24.5 25 26.5 5 5 F 21.5 23 22.5 23.5 6 6 F 20 21 21 22.5 7 7 F 21.5 22.5 23 25 8 8 F 23 23 23.5 24 9 9 F NA 21 NA 21.5 10 10 F 16.5 19 19 19.5 # ... with 17 more rows I would like to fill the missiningness data via the Last Observation Carried Forward method (LOCF) and the Next

LOCF and NOCF methods for missing data: how to plot data?

戏子无情 提交于 2021-02-20 03:49:20
问题 I'm working on the following dataset and its missing data: # A tibble: 27 x 6 id sex d8 d10 d12 d14 <dbl> <chr> <dbl> <dbl> <dbl> <dbl> 1 1 F 21 20 21.5 23 2 2 F 21 21.5 24 25.5 3 3 NA NA 24 NA 26 4 4 F 23.5 24.5 25 26.5 5 5 F 21.5 23 22.5 23.5 6 6 F 20 21 21 22.5 7 7 F 21.5 22.5 23 25 8 8 F 23 23 23.5 24 9 9 F NA 21 NA 21.5 10 10 F 16.5 19 19 19.5 # ... with 17 more rows I would like to fill the missiningness data via the Last Observation Carried Forward method (LOCF) and the Next

Python Pandas dataframe find missing values

余生颓废 提交于 2021-02-16 20:18:15
问题 I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer. Extracted Dataframe: In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values. Exported Dataframe using isnull: I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values. I had

R- Select rows with non-NA values in at least one of the four columns

无人久伴 提交于 2021-02-16 15:27:31
问题 I have this code that works fine: CompleteCoxObs<-temp[is.na(temp[,8])== FALSE | is.na(temp[,9])== FALSE | is.na(temp[,10])== FALSE,]; What is a better and more efficient way to achieve the same result? 回答1: You can try this to check for all the columns: CompleteCox.df <- temp.df[rowSums(is.na(temp.df)) != ncol(temp.df),] In your case: CompleteCox.df <- temp.df[rowSums(is.na(temp.df[, c(8,9,10)])) != 3,] 回答2: You can try one of the followings: temp[!is.na(rowSums(temp[,8:10])),] or temp[

r - copy missing values from other variables

风流意气都作罢 提交于 2021-02-10 05:31:35
问题 Simple question, but I can't figure out how to do the following. This is my data: ID Time1 Time2 Time3 Time4 01 23 23 NA NA 02 21 21 21 NA 03 22 22 25 NA 04 29 29 20 NA 05 NA NA 15 22 06 NA NA 11 NA Now, I want to replace missing values (NA) with the data that is available in other variables. Importantly, I need r to take the value that is 'closest' to the missing data point. E.g., for ID 5, Time1 and Time2 should be "15" (not "22"). Like this: ID Time1 Time2 Time3 Time4 01 23 23 23 23 02 21

Define multiple values as missing in a data frame

落爺英雄遲暮 提交于 2021-02-09 15:02:53
问题 How do I define multiple values as missing in a data frame in R? Consider a data frame where two values, "888" and "999", represent missing data: df <- data.frame(age=c(50,30,27,888),insomnia=c("yes","no","no",999)) df[df==888] <- NA df[df==999] <- NA This solution takes one line of code per value representing missing data. Do you have a more simple solution for situations where the number of values representing missing data is high? 回答1: Here are three solutions: # 1. Data set df <- data

How to change na.action for zero-inflated regression model?

自古美人都是妖i 提交于 2021-02-08 07:37:50
问题 I am running a zero-inflated negative binomial regression model using the function zeroinfl from the pscl package. I need to exclude NA's from the model in order to be able to plot the residuals against the dependent variable later in the analysis. Therefore, I want to set na.action="na.exclude" . I can do this without any problem for a non-zero-inflated negative binomial regression model (using glm.nb from the glm package), eg. fm_nbin <- glm.nb(DV ~ factor(IDV) + contr1 +contr2 + contr3,

Imputing missing values using sklearn IterativeImputer class for MICE

…衆ロ難τιáo~ 提交于 2021-02-08 04:57:29
问题 I'm trying to learn how to implement MICE in imputing missing values for my datasets. I've heard about fancyimpute's MICE, but I also read that sklearn's IterativeImputer class can accomplish similar results. From sklearn's docs: Our implementation of IterativeImputer was inspired by the R MICE package (Multivariate Imputation by Chained Equations) [1], but differs from it by returning a single imputation instead of multiple imputations. However, IterativeImputer can also be used for multiple

Missing values while scraping using beautifulsoup in python

假如想象 提交于 2021-01-29 11:18:23
问题 I'm trying to do web scraping as my first project using python (completely new to programming), I'm almost done, however some values on the web page are missing, so I want to replace that missing value with something like a "0" or "Not found", really I just want to make a csv file out of the data, not really going forward with the analysis. The web page I'm scraping is: https://www.lamudi.com.mx/nuevo-leon/departamento/for-rent/?page=1 I have a loop that collects all of te links of the page,

Getting wrong values after merging two dataframe on datetime

人走茶凉 提交于 2021-01-29 08:47:02
问题 I want to merge a time serie of % humidity with a range of datetime created as expected, to fill missing records (or rows) with NaN and obtain a time serie based on 15min records (as long as the sensor is designed for). Data of humidity following recorded datetime : humdt = pd.DataFrame(data = data["la-salade"][["datetime","humidite"]]) datetime humidite 0 2019-07-09 08:30:00 87 1 2019-07-09 11:00:00 87 2 2019-07-09 17:30:00 82 3 2019-07-09 23:30:00 80 4 2019-07-11 06:15:00 79 5 2019-07-19 14