data.table | 易学教程

define color scheme when plotting line plot with plotly

阅读更多关于 define color scheme when plotting line plot with plotly

问题 I have the following data table (which is not always the same, and has always a different number of columns) and code for plotting a line chart: dt <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365), Germany = rnorm(365, 2, 1), Austria = rnorm(365, 3, 4), Czechia = rnorm(365, 2, 3), check.names = FALSE) colNames <- names(dt)[-1] ## assuming date is the first column p <- plotly::plot_ly() for(trace in colNames){ p <- p %>% plotly::add_trace(data = dt, x = ~date, y

Use string representation of variable in i for data.table

阅读更多关于 Use string representation of variable in i for data.table

问题 Apparently I am too stupid to enter the correct search terms, b/c I think that my question is not unique at all. How to refer to a variable by string in the i part of data.table ? with and ..x are all good for the j part, but what would be the equivalent in the i part? Do I have to use evil eval (pun intended ;) library(data.table) dt <- data.table(x = 1:4, y = 4:1) my_filter_fun <- function(var = names(dt)) { var <- match.arg(var) dt[eval(parse(text = paste(var, "== 1")))] } my_filter_fun("x

fuzzy outer join/merge in R

阅读更多关于 fuzzy outer join/merge in R

问题 I have 2 datasets and want to do fuzzy join. Here is the two datasets. library(data.table) # data1 dt1 <- fread("NAME State type ABERCOMBIE TOWNSHIP ND TS ABERDEEN TOWNSHIP NJ TS ABERDEEN TOWNSHIP SD TS ABBOTSFORD CITY WI CI ABERDEEN CITY WA CI ADA TOWNSHIP MI TS ADAMS IL TS", header = T) # data2 dt2 <- fread("NAME State type ABERDEEN TWP N J NJ TS ABERDEEN WASH WA CI ABBOTSFORD WIS WI CI ADA TWP MICH MI TS ADA OHIO OH CI ADAMS MASS MA CI ADAMSVILLE ALA AL CI", header = T) Two datasets have

applying cut() on R dataframe daywise

阅读更多关于 applying cut() on R dataframe daywise

问题 I have a datatable in R on which I apply a cut() and table() . I am able to get the frequency table based on the conditions. But I am getting overall frequencies. I want to get it day wise. I have a column named as timestamp which have timestamp. Also I have a section column which has value either A or B . How to cut it based on each day each section. My current output : Var1 Freq 0-30 1398 30-60 1051 60-80 1006 80-100 36 100> 2 Expected output: Date Sec Var1 Freq 05-01-2020 A 0-30 1398 05-01

Get the local time from a UTC time

阅读更多关于 Get the local time from a UTC time

问题 Let's say I have a data set with the date, latitude, and longitude. dt = data.table(date = c("2017-10-24 05:01:05", "2017-10-24 05:01:57", "2017-10-24 05:02:54"), lat = c(-6.2704925537109375, -6.2704925537109375, -6.2704925537109375), long = c(106.5803680419922, 106.5803680419922, 106.5803680419922)) The time is UTC. Is it possible to transfer that UTC to the local time using the lat and long? 回答1: I found a good answer on converting longitude and latitude to timezones here, so here is how we

Conditionally sum dynamic columns in r

阅读更多关于 Conditionally sum dynamic columns in r

问题 I am trying to conditionally sum across many columns depending on if they are greater than or less than 0. I am surprised I cannot find a dplyr or data.table work around for this. I want to calculate 4 new columns for a large data.frame (columns to calculate are at bottom of post). dat2=matrix(nrow=10,rnorm(100));colnames(dat2)=paste0('V',rep(1:10)) dat2 %>% as.data.frame() %>% rowwise() %>% select_if(function(col){mean(col)>0}) %>% mutate(sum_pos=rowSums(.)) ##Obviously doesn't work These

Compare groups with each other

阅读更多关于 Compare groups with each other

问题 Is there a way in dplyr to compare groups with each other? Here a concrete example: I would like to apply a t-test to the following combinations: a vs b, a vs c and b vs c set.seed(1) tibble(value = c(rnorm(1000, 1, 1), rnorm(1000, 5, 1), rnorm(1000, 10,1)), group=c(rep("a", 1000), rep("b", 1000), rep("c", 1000))) %>% nest(value) # A tibble: 3 x 2 group data <chr> <list> 1 a <tibble [1,000 × 1]> 2 b <tibble [1,000 × 1]> 3 c <tibble [1,000 × 1]> If dplyr provides no solution, i would also be

Conditionally sum dynamic columns in r

阅读更多关于 Conditionally sum dynamic columns in r

Error reading Field with Double Quotes and Commas using Fread

阅读更多关于 Error reading Field with Double Quotes and Commas using Fread

问题 I have a large csv file with 19 columns of character/numeric data. Upon running fread, I got an error message saying one of my numeric columns was being converted to character because the field had value "" . I then opened up my data in a text editor, and found the source of my problem. On one line, a character column read: """PARENTS"", ""Y.M."", AND ""EXPECTING""" Which corresponded to the string: "PARENTS", "Y.M.", AND "EXPECTING" As: The first quote is a string protector The 2nd to 6th

R Data.table divide values in column based on another column

阅读更多关于 R Data.table divide values in column based on another column

问题 I have a main data.table which has 364 rows and the 3 columns: Date Weekday Weight 2012-01-01 Monday 100 2013-01-02 Tuesday 200 ... and a help data.table with 7 rows 2 columns: Weekday Coefficient Monday 0.91 Tuesday 0.84 Wednesday 0.99 ... Now i would like to create a 4th column in the main data.table with the "weight/Coefficient" based on the Weekday. Weight_divided <- main[, Weight * help[Weekday==main$Weekday]$Coefficient] The result is the following: Date Weekday Weight Weight_divided