r | 易学教程

How to replace NA values in a data.table with na.spline

阅读更多关于 How to replace NA values in a data.table with na.spline

问题 I'm trying to prepare some demographic data retrieved from Eurostat for further processing, amongst others replacing any missing data with corresponding approximated ones. First I was using data.frames only, but then I got convinced that data.tables might offer some advantages over regular data.frames, so I migrated to data.tables. One thing I've observed while doing so was getting different results when using "na.spline" in combination with "apply" versus "na.spline" as part of the data

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

Create separate vectors for each of a data frame's columns (variables)

阅读更多关于 Create separate vectors for each of a data frame's columns (variables)

问题 Goal: Take a data frame and create separate vectors for each of its columns (variables). The following code gets me close: batting <- read.csv("mlb_2014.csv", header = TRUE, sep = ",") hr <- batting[(batting$HR >= 20 & batting$PA >= 100), ] var_names <- colnames(hr) for(i in var_names) { path <- paste("hr$", i, sep = "") assign(i, as.vector(path)) } It creates the a vector for each column in the data frame as shown by the output below: > ls() [1] "AB" "Age" "BA" "batting" "BB" "CS" [7] "G"

Create separate vectors for each of a data frame's columns (variables)

阅读更多关于 Create separate vectors for each of a data frame's columns (variables)

Using switch statement within dplyr's mutate

阅读更多关于 Using switch statement within dplyr's mutate

问题 I would like to use a switch statement within dplyr's mutate. I have a simple function that performs some operations and assigns alternative values via switch, for example: convert_am <- function(x) { x <- as.character(x) switch(x, "0" = FALSE, "1" = TRUE, NA) } This works as desired when applied to scalars: >> convert_am(1) [1] TRUE >> convert_am(2) [1] NA >> convert_am(0) [1] FALSE I would like to arrive at equivalent results via mutate call: mtcars %>% mutate(am = convert_am(am)) This

Weird case with data tables in R, column names are mixed

阅读更多关于 Weird case with data tables in R, column names are mixed

问题 So I have created this variable that is called mc_split_device inside the datatable called mc_with_devices . However, If I type mc_with_devices$mc_split I get the values of the column mc_split_device while I never created any variable with the name mc_split . 回答1: See Hadley Wickham's Advanced R: $ is a shorthand operator, where x$y is equivalent to x[["y", exact = FALSE]]. It’s often used to access variables in a data frame, as in mtcars$cyl or diamonds$carat. So the exact=FALSE is the

How to mutate for loop in dplyr

阅读更多关于 How to mutate for loop in dplyr

问题 I want to create multiple lag variables for a column in a data frame for a range of values. I have code that successfully does what I want but is not scalable for what I need (hundreds of iterations) I have code below that successfully does what I want but is not scalable for what I need (hundreds of iterations) Lake_Lag <- Lake_Champlain_long.term_monitoring_1992_2016 %>% group_by(StationID,Test) %>% arrange(StationID,Test,VisitDate) %>% mutate(lag.Result1 = dplyr::lag(Result, n = 1, default