r

How to divide specific column with rest of columns

拈花ヽ惹草 提交于 2021-02-11 12:17:29
问题 I have matrix like this (first column names rest are values, separator i tab): name1 A1 B1 C1 D1 name2 A2 B2 C2 D2 Matrix could be huge (it is mean about hundreds rows and columns). It is allays same size. I can expect zero values. I need output like this: name1 A1 B1 C1 D1 A1/B1 A1/C1 A1/D1 name2 A2 B2 C2 D2 A2/B2 A2/C2 A2/D2 This combination save to new file. And then make another combination: name1 A1 B1 C1 D1 B1/A1 B1/C1 B1/D1 name2 A2 B2 C2 D2 B2/A2 B2/C2 B2/D2 and so on so on => divide

How calculate the area of a polygon in R, if I only have X,Y coordinates?

江枫思渺然 提交于 2021-02-11 12:17:22
问题 I have a data frame with X, Y coordinates, but I need to calculate the area that cover all the points in the scatterplot there is a way to draw a polygon that surround all the points and calculate this area? 回答1: This is a convex hull problem. Other questions cover similar territory. We can gather the plotting and area calculation problems here for good measure. To plot the convex hull of a point cloud, we can use the chull function: library(tidyverse) data <- tibble(x = runif(50), y = runif

R user-defined/dynamic summary function within dplyr::summarise

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-11 12:14:29
问题 Somewhat hard to define this question without sounding like lots of similar questions! I have a function for which I want one of the parameters to be a function name, that will be passed to dplyr::summarise, e.g. "mean" or "sum": data(mtcars) f <- function(x = mtcars, groupcol = "cyl", zCol = "disp", zFun = "mean") { zColquo = quo_name(zCol) cellSummaries <- x %>% group_by(gear, !!sym(groupcol)) %>% # 1 preset grouper, 1 user-defined summarise(Count = n(), # 1 preset summary, 1 user defined !

Grepl group of strings and count frequency of all using R

一个人想着一个人 提交于 2021-02-11 12:08:56
问题 I have a column of 50k rows of tweets named text from a csv file (the tweets consists of sentences, phrases etc). I'm trying to count frequency of several words in that column. Is there an easier way to do it vs what I'm doing below? # Reading my file tweets <- read.csv('coffee.csv', header=TRUE) # Doing a grepl per word (This is hard because I need to look for many words one by one) coffee <- grepl("coffee", text$tweets, ignore.case=TRUE) mugs <- grepl("mugs", text$tweets, ignore.case=TRUE)

Grepl group of strings and count frequency of all using R

一世执手 提交于 2021-02-11 12:08:23
问题 I have a column of 50k rows of tweets named text from a csv file (the tweets consists of sentences, phrases etc). I'm trying to count frequency of several words in that column. Is there an easier way to do it vs what I'm doing below? # Reading my file tweets <- read.csv('coffee.csv', header=TRUE) # Doing a grepl per word (This is hard because I need to look for many words one by one) coffee <- grepl("coffee", text$tweets, ignore.case=TRUE) mugs <- grepl("mugs", text$tweets, ignore.case=TRUE)

Add missing months for a range of date in R

左心房为你撑大大i 提交于 2021-02-11 12:01:22
问题 Say I have a data.frame as follows, each month has one entry of data: df <- read.table(text="date,gmsl 2009-01-17,58.4 2009-02-17,59.1 2009-04-16,60.9 2009-06-16,62.3 2009-09-16,64.6 2009-12-16,68.3",sep=",",header=TRUE) ## > df ## date gmsl ## 1 2009-01-17 58.4 ## 2 2009-02-17 59.1 ## 3 2009-04-16 60.9 ## 4 2009-06-16 62.3 ## 5 2009-09-16 64.6 ## 6 2009-12-16 68.3 Just wondering how could I fill missing month with gmsl as NaN for date range from 2009-01 to 2009-12 ? I have extracted year and

R: How to separate multiple choice, multiple answers questionnaire data that Google Forms put in one variable? [duplicate]

大城市里の小女人 提交于 2021-02-11 11:56:10
问题 This question already has answers here : Split a column of concatenated comma-delimited data and recode output as factors (2 answers) Closed 3 years ago . I have run a survey using Google Forms. I downloaded the response dataset as a spreadsheet, but unfortunately when it comes to multiple choice, multiple anwsers responses, the data looks something like this: Q1 Q2 Q3 1 "A, B ,C" S 2 "C, D" T 1 "A, C, E" U 3 "D" V 2 "B, E" Z I would like to have it in a form similar to the below: Q1 Q2 Q2A

How to add timestamp in R?

天大地大妈咪最大 提交于 2021-02-11 11:53:05
问题 I have a data which have the difference between the start and end time of an event. Now I want to add the difference. the problem is the difference time is in format difference_time _______________ 00:10:00 00:30:12 01:09:09 00:09:03 01:09:30 01:09:03 00:09:08 01:00:09 09:00:01 But if I do sum(df$difference_time) it throws the error that invalid type of arguement. I want the result to be something like below format: 51975 seconds. Any help is appreciated UPDATE: I tried period_to_seconds(hms

variance-covariance matrix in R

馋奶兔 提交于 2021-02-11 11:51:58
问题 I have the data frame below and from there I've calculated the matrix b from the betas of coefficients of my linear regression model. How do I create the variance-covariance matrix in R, or s^2_b ? y <- c(42, 33, 75, 28, 91, 55) int <- c(1, 1, 1, 1, 1, 1) x1 <- c(7, 4, 16, 3, 21, 8) x2 <- c(33, 41, 7, 49, 5, 31) df <- data.frame(y, x1, x2) mod1 <- lm(y ~ x1 + x2, data = df) # b iint <- summary(mod1)$coefficients[[1]] xx1 <- summary(mod1)$coefficients[[2]] xx2 <- summary(mod1)$coefficients[[3]

How can I change the legend geometry in ggplot2

霸气de小男生 提交于 2021-02-11 11:50:36
问题 Hello say I plot this boxplot: library(ggplot2) DT <- data.frame( y = runif(400, max = 2), grp = sample(c('M', 'F'),size = 400, replace = T), x = rep(as.Date(1:10,origin='2011-01-01'), each = 40) ) p <- ggplot(DT) + geom_boxplot() + aes(x = x, y = y, group=interaction(x,grp), fill=grp) p Question is how can I replace those little boxes in the legend by lines (like I would have using graphics ) 回答1: easiest option might be to make the lines invisible, p + guides(fill = guide_legend(override