na

R - convert nan to 0 results in all 0's

感情迁移 提交于 2019-12-12 05:10:00
问题 I have a data frame containing NaN's that I'd like to convert to 0's. I wrote a function that I think should work: fix_nan <- function(x){ return(x[is.nan(x)] <- 0) } And then I apply it to the data frame: train_e <- structure(list(pack_id = structure(1:10, .Label = c("1", "2", "4", "5", "7", "8", "9", "10", "11", "14"), class = "factor"), item_1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), item_2 = c(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), item_3 = c(1.45225232891169, 0.613104472886409, NaN

R (arules) Convert dataframe into transactions and remove NA

一笑奈何 提交于 2019-12-12 04:38:21
问题 i have a set dataframe. My purpose is to convert the dataframe into transactions data in order to do market basket analysis using Arules package in R. I did do some research online regarding conversion of dataframe to transactions data, e.g.(How to prep transaction data into basket for arules) and (Transform csv into transactions for arules), but the result i got was different. dput(df) structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"), Fruits = c(NA, "Apple",

Variable in CSV File Contains Numbers But Imported as Character

二次信任 提交于 2019-12-12 03:44:30
问题 I have a variable in a dataset (CSV format) that consists only numbers but when the dataset is imported into R, it becomes a character variable. Any reasons why? When I tried to coerce it as numeric, a lot of NAs are introduced. > df1$Postal_Code<-as.numeric(df1$Postal_Code) Warning message: NAs introduced by coercion sum(is.na(df1$Postal_Code)) ## [1] 2822 sum(is.na(as.numeric(df1$Postal_Code))) ## [1] 2837 来源: https://stackoverflow.com/questions/38449245/variable-in-csv-file-contains

Sort dataframe rows independently by values in another dataframe

情到浓时终转凉″ 提交于 2019-12-12 03:28:49
问题 Suppose two dataframes: import pandas as pd import numpy as np d1 = {} d2 = {} np.random.seed(5) for col in list("ABCDEF"): d1[col] = np.random.randn(12) d2[col+'2'] = np.random.random_integers(0,100, 12) t_index = pd.date_range(start = '2015-01-31', periods = 12, freq = "M") dat1 = pd.DataFrame(d1, index = t_index) dat2 = pd.DataFrame(d2, index = t_index) I want to sort dat1's rows by the rows in dat2 and extract a subset of the ordered data from dat1. Below, is an example where the top 5

Pandas Dataframe with NA values throwing ValueError

☆樱花仙子☆ 提交于 2019-12-12 03:18:46
问题 I have a dataframe in pandas that looks like this df.head(2) Out[25]: CompanyName Region MachineType recvd_dttm 2014-07-13 12:40:40 Company1 NA Machine1 2014-07-13 15:31:39 Company2 NA Machine2 I am first taking data in a certain date range, then trying to get data that is in the Region NA and is MachineType Machine1. However, I keep getting this error: ValueError: Length mismatch: Expected axis has 4 elements, new values have 3 elements This code worked until I added the region column and

How to fill NA in R for quasi-same row?

为君一笑 提交于 2019-12-12 01:23:53
问题 I'm looking for a way to fillNA in duplicated() rows. There are totally same rows and at one time there is a NA, so I decide to fill this one by value of complete row but I don't see how to deal with it. Using the duplicated() function, I could have a data frame like that: df <- data.frame( Year = rnorm(5), hour = rnorm(5), LOT = rnorm(5), S123_AA = c('ABF4576','ABF4576','ABF4576','ABF4576','ABF4576'), S135_AA = c('ABF5403',NA,'ABF5403','ABF5403','ABF5403'), S13_BB = c('BF50343','BF50343',

Weighted average value in the presence of NA values

谁说胖子不能爱 提交于 2019-12-11 23:45:49
问题 Here's a very simple example of what I'm dealing with: data_stack <- data.table(CompA_value = c(10,20,30,40), CompB_value = c(60,70,80,80), CompC_value = c(NA, NA, NA, 100), CompA_weight = c(0.2, 0.3,0.4,0.4), CompB_weight = c(0.8,0.7,0.6,0.4), CompC_weight = c(NA, NA, NA,0.2)) CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight 1: 10 60 NA 0.2 0.8 NA 2: 20 70 NA 0.3 0.7 NA 3: 30 80 NA 0.4 0.6 NA 4: 40 80 100 0.4 0.4 0.2 What I want to do is calculate the weighted

Excel, Array Formulas, N/A outside of range, and ROW()

懵懂的女人 提交于 2019-12-11 22:27:14
问题 I have a problem with ROW() in an array formula in Excel 2013. Example: I make a named range, called 'input', say 4 cells wide and 10 high. Then I make an array formula =ROW(input) one cell wide, 15 cells high. I get 10 numbers - the first is the first row of input, and the rest count up from that, and then 5 #N/A follow. This is as it should be. If instead of =ROW(input) I try one of the following: =IFERROR(ROW(input),"x") or =IF(ISNA(ROW(input)),"x",ROW(input)) to catch the #N/As then what

error glm, NA/NaN/Inf in 'y

不羁的心 提交于 2019-12-11 17:52:13
问题 I am trying to fit a GLM model to my data. The data ( rope_complete ) looks like this: rope.X...Sound rope.directional.change rope.Time.of.the.shark.in.the.video 1 5_min_blank 5 23 2 Snorkeling 11 37 3 Fish1 1 17 4 Fish1 6 46 5 Diving 6 37 Now i wanted to check if I have NA values: table(is.na(rope_complete)) and saw that I have none: FALSE : 3225 Now I did my GLM: directional_turn_fit<-glm(rope_complete$rope.directional.change~ rope_complete$rope.X...Sound +offset( log(rope_complete$rope

as.date creates some NAs in dataset

好久不见. 提交于 2019-12-11 16:54:32
问题 I have a simple little dataset: > str(SFdischg) 'data.frame': 11932 obs. of 4 variables: $ date: Factor w/ 11932 levels "1/01/1985","1/01/1986",..: 97 4409 8697 9677 10069 10461 10853 11245 11637 489 ... $ ddmm: Factor w/ 366 levels "01-Apr","01-Aug",..: 1 13 25 37 49 61 73 85 97 109 ... $ year: int 1984 1984 1984 1984 1984 1984 1984 1984 1984 1984 ... $ cfs : int 1500 1430 1500 1850 1810 1830 1850 1880 1970 1980 ... I would like to have a column of dates so that I can plot temporal data: