r | 易学教程

How to divide specific column with rest of columns

阅读更多关于 How to divide specific column with rest of columns

问题 I have matrix like this (first column names rest are values, separator i tab): name1 A1 B1 C1 D1 name2 A2 B2 C2 D2 Matrix could be huge (it is mean about hundreds rows and columns). It is allays same size. I can expect zero values. I need output like this: name1 A1 B1 C1 D1 A1/B1 A1/C1 A1/D1 name2 A2 B2 C2 D2 A2/B2 A2/C2 A2/D2 This combination save to new file. And then make another combination: name1 A1 B1 C1 D1 B1/A1 B1/C1 B1/D1 name2 A2 B2 C2 D2 B2/A2 B2/C2 B2/D2 and so on so on => divide

How calculate the area of a polygon in R, if I only have X,Y coordinates?

阅读更多关于 How calculate the area of a polygon in R, if I only have X,Y coordinates?

问题 I have a data frame with X, Y coordinates, but I need to calculate the area that cover all the points in the scatterplot there is a way to draw a polygon that surround all the points and calculate this area? 回答1: This is a convex hull problem. Other questions cover similar territory. We can gather the plotting and area calculation problems here for good measure. To plot the convex hull of a point cloud, we can use the chull function: library(tidyverse) data <- tibble(x = runif(50), y = runif

R user-defined/dynamic summary function within dplyr::summarise

阅读更多关于 R user-defined/dynamic summary function within dplyr::summarise

问题 Somewhat hard to define this question without sounding like lots of similar questions! I have a function for which I want one of the parameters to be a function name, that will be passed to dplyr::summarise, e.g. "mean" or "sum": data(mtcars) f <- function(x = mtcars, groupcol = "cyl", zCol = "disp", zFun = "mean") { zColquo = quo_name(zCol) cellSummaries <- x %>% group_by(gear, !!sym(groupcol)) %>% # 1 preset grouper, 1 user-defined summarise(Count = n(), # 1 preset summary, 1 user defined !

Grepl group of strings and count frequency of all using R

阅读更多关于 Grepl group of strings and count frequency of all using R

问题 I have a column of 50k rows of tweets named text from a csv file (the tweets consists of sentences, phrases etc). I'm trying to count frequency of several words in that column. Is there an easier way to do it vs what I'm doing below? # Reading my file tweets <- read.csv('coffee.csv', header=TRUE) # Doing a grepl per word (This is hard because I need to look for many words one by one) coffee <- grepl("coffee", text$tweets, ignore.case=TRUE) mugs <- grepl("mugs", text$tweets, ignore.case=TRUE)

Grepl group of strings and count frequency of all using R

阅读更多关于 Grepl group of strings and count frequency of all using R

Add missing months for a range of date in R

阅读更多关于 Add missing months for a range of date in R

问题 Say I have a data.frame as follows, each month has one entry of data: df <- read.table(text="date,gmsl 2009-01-17,58.4 2009-02-17,59.1 2009-04-16,60.9 2009-06-16,62.3 2009-09-16,64.6 2009-12-16,68.3",sep=",",header=TRUE) ## > df ## date gmsl ## 1 2009-01-17 58.4 ## 2 2009-02-17 59.1 ## 3 2009-04-16 60.9 ## 4 2009-06-16 62.3 ## 5 2009-09-16 64.6 ## 6 2009-12-16 68.3 Just wondering how could I fill missing month with gmsl as NaN for date range from 2009-01 to 2009-12 ? I have extracted year and

R: How to separate multiple choice, multiple answers questionnaire data that Google Forms put in one variable? [duplicate]

阅读更多关于 R: How to separate multiple choice, multiple answers questionnaire data that Google Forms put in one variable? [duplicate]

问题 This question already has answers here : Split a column of concatenated comma-delimited data and recode output as factors (2 answers) Closed 3 years ago . I have run a survey using Google Forms. I downloaded the response dataset as a spreadsheet, but unfortunately when it comes to multiple choice, multiple anwsers responses, the data looks something like this: Q1 Q2 Q3 1 "A, B ,C" S 2 "C, D" T 1 "A, C, E" U 3 "D" V 2 "B, E" Z I would like to have it in a form similar to the below: Q1 Q2 Q2A

How to add timestamp in R?

阅读更多关于 How to add timestamp in R?

问题 I have a data which have the difference between the start and end time of an event. Now I want to add the difference. the problem is the difference time is in format difference_time _______________ 00:10:00 00:30:12 01:09:09 00:09:03 01:09:30 01:09:03 00:09:08 01:00:09 09:00:01 But if I do sum(df$difference_time) it throws the error that invalid type of arguement. I want the result to be something like below format: 51975 seconds. Any help is appreciated UPDATE: I tried period_to_seconds(hms

variance-covariance matrix in R

阅读更多关于 variance-covariance matrix in R

问题 I have the data frame below and from there I've calculated the matrix b from the betas of coefficients of my linear regression model. How do I create the variance-covariance matrix in R, or s^2_b ? y <- c(42, 33, 75, 28, 91, 55) int <- c(1, 1, 1, 1, 1, 1) x1 <- c(7, 4, 16, 3, 21, 8) x2 <- c(33, 41, 7, 49, 5, 31) df <- data.frame(y, x1, x2) mod1 <- lm(y ~ x1 + x2, data = df) # b iint <- summary(mod1)$coefficients[[1]] xx1 <- summary(mod1)$coefficients[[2]] xx2 <- summary(mod1)$coefficients[[3]

How can I change the legend geometry in ggplot2

阅读更多关于 How can I change the legend geometry in ggplot2

问题 Hello say I plot this boxplot: library(ggplot2) DT <- data.frame( y = runif(400, max = 2), grp = sample(c('M', 'F'),size = 400, replace = T), x = rep(as.Date(1:10,origin='2011-01-01'), each = 40) ) p <- ggplot(DT) + geom_boxplot() + aes(x = x, y = y, group=interaction(x,grp), fill=grp) p Question is how can I replace those little boxes in the legend by lines (like I would have using graphics ) 回答1: easiest option might be to make the lines invisible, p + guides(fill = guide_legend(override