data.table

define color scheme when plotting line plot with plotly

爱⌒轻易说出口 提交于 2021-01-28 07:41:27
问题 I have the following data table (which is not always the same, and has always a different number of columns) and code for plotting a line chart: dt <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365), Germany = rnorm(365, 2, 1), Austria = rnorm(365, 3, 4), Czechia = rnorm(365, 2, 3), check.names = FALSE) colNames <- names(dt)[-1] ## assuming date is the first column p <- plotly::plot_ly() for(trace in colNames){ p <- p %>% plotly::add_trace(data = dt, x = ~date, y

Use string representation of variable in i for data.table

旧城冷巷雨未停 提交于 2021-01-28 05:21:31
问题 Apparently I am too stupid to enter the correct search terms, b/c I think that my question is not unique at all. How to refer to a variable by string in the i part of data.table ? with and ..x are all good for the j part, but what would be the equivalent in the i part? Do I have to use evil eval (pun intended ;) library(data.table) dt <- data.table(x = 1:4, y = 4:1) my_filter_fun <- function(var = names(dt)) { var <- match.arg(var) dt[eval(parse(text = paste(var, "== 1")))] } my_filter_fun("x

fuzzy outer join/merge in R

泄露秘密 提交于 2021-01-28 03:12:05
问题 I have 2 datasets and want to do fuzzy join. Here is the two datasets. library(data.table) # data1 dt1 <- fread("NAME State type ABERCOMBIE TOWNSHIP ND TS ABERDEEN TOWNSHIP NJ TS ABERDEEN TOWNSHIP SD TS ABBOTSFORD CITY WI CI ABERDEEN CITY WA CI ADA TOWNSHIP MI TS ADAMS IL TS", header = T) # data2 dt2 <- fread("NAME State type ABERDEEN TWP N J NJ TS ABERDEEN WASH WA CI ABBOTSFORD WIS WI CI ADA TWP MICH MI TS ADA OHIO OH CI ADAMS MASS MA CI ADAMSVILLE ALA AL CI", header = T) Two datasets have

applying cut() on R dataframe daywise

不想你离开。 提交于 2021-01-28 01:16:37
问题 I have a datatable in R on which I apply a cut() and table() . I am able to get the frequency table based on the conditions. But I am getting overall frequencies. I want to get it day wise. I have a column named as timestamp which have timestamp. Also I have a section column which has value either A or B . How to cut it based on each day each section. My current output : Var1 Freq 0-30 1398 30-60 1051 60-80 1006 80-100 36 100> 2 Expected output: Date Sec Var1 Freq 05-01-2020 A 0-30 1398 05-01

Get the local time from a UTC time

懵懂的女人 提交于 2021-01-27 23:46:12
问题 Let's say I have a data set with the date, latitude, and longitude. dt = data.table(date = c("2017-10-24 05:01:05", "2017-10-24 05:01:57", "2017-10-24 05:02:54"), lat = c(-6.2704925537109375, -6.2704925537109375, -6.2704925537109375), long = c(106.5803680419922, 106.5803680419922, 106.5803680419922)) The time is UTC. Is it possible to transfer that UTC to the local time using the lat and long? 回答1: I found a good answer on converting longitude and latitude to timezones here, so here is how we

Conditionally sum dynamic columns in r

女生的网名这么多〃 提交于 2021-01-27 19:50:34
问题 I am trying to conditionally sum across many columns depending on if they are greater than or less than 0. I am surprised I cannot find a dplyr or data.table work around for this. I want to calculate 4 new columns for a large data.frame (columns to calculate are at bottom of post). dat2=matrix(nrow=10,rnorm(100));colnames(dat2)=paste0('V',rep(1:10)) dat2 %>% as.data.frame() %>% rowwise() %>% select_if(function(col){mean(col)>0}) %>% mutate(sum_pos=rowSums(.)) ##Obviously doesn't work These

Compare groups with each other

谁说胖子不能爱 提交于 2021-01-27 19:21:01
问题 Is there a way in dplyr to compare groups with each other? Here a concrete example: I would like to apply a t-test to the following combinations: a vs b, a vs c and b vs c set.seed(1) tibble(value = c(rnorm(1000, 1, 1), rnorm(1000, 5, 1), rnorm(1000, 10,1)), group=c(rep("a", 1000), rep("b", 1000), rep("c", 1000))) %>% nest(value) # A tibble: 3 x 2 group data <chr> <list> 1 a <tibble [1,000 × 1]> 2 b <tibble [1,000 × 1]> 3 c <tibble [1,000 × 1]> If dplyr provides no solution, i would also be

Conditionally sum dynamic columns in r

倖福魔咒の 提交于 2021-01-27 19:20:39
问题 I am trying to conditionally sum across many columns depending on if they are greater than or less than 0. I am surprised I cannot find a dplyr or data.table work around for this. I want to calculate 4 new columns for a large data.frame (columns to calculate are at bottom of post). dat2=matrix(nrow=10,rnorm(100));colnames(dat2)=paste0('V',rep(1:10)) dat2 %>% as.data.frame() %>% rowwise() %>% select_if(function(col){mean(col)>0}) %>% mutate(sum_pos=rowSums(.)) ##Obviously doesn't work These

Error reading Field with Double Quotes and Commas using Fread

廉价感情. 提交于 2021-01-27 13:44:45
问题 I have a large csv file with 19 columns of character/numeric data. Upon running fread, I got an error message saying one of my numeric columns was being converted to character because the field had value "" . I then opened up my data in a text editor, and found the source of my problem. On one line, a character column read: """PARENTS"", ""Y.M."", AND ""EXPECTING""" Which corresponded to the string: "PARENTS", "Y.M.", AND "EXPECTING" As: The first quote is a string protector The 2nd to 6th

R Data.table divide values in column based on another column

萝らか妹 提交于 2021-01-27 13:42:42
问题 I have a main data.table which has 364 rows and the 3 columns: Date Weekday Weight 2012-01-01 Monday 100 2013-01-02 Tuesday 200 ... and a help data.table with 7 rows 2 columns: Weekday Coefficient Monday 0.91 Tuesday 0.84 Wednesday 0.99 ... Now i would like to create a 4th column in the main data.table with the "weight/Coefficient" based on the Weekday. Weight_divided <- main[, Weight * help[Weekday==main$Weekday]$Coefficient] The result is the following: Date Weekday Weight Weight_divided