tidyverse

repeated mutate in tidyverse

左心房为你撑大大i 提交于 2019-12-11 03:12:29
问题 consider the following tibble and the follwing vector: library(tidyverse) a <- tibble(val1 = 10:15, val2 = 20:25) params <- 1:3 Also I have a function myfun which takes a vector of arbitrary length and an integer as input and returns an vector of the same length. For demonstration purposes you can think of myfun <- function(x, k) dplyr::lag(x, k) I want to create the follwing: for each column in a and for each element in params I want to create a new column given by myfun(col, params[i]) . In

transmute new columns based on exact match of multiple words in string

拈花ヽ惹草 提交于 2019-12-11 02:35:50
问题 I have a data frame: df <- data.frame( Otherspp = c("suck SD", "BT", "SD RS", "RSS"), Dominantspp = c("OM", "OM", "RSS", "CH"), Commonspp = c(" ", " ", " ", "OM"), Rarespp = c(" ", " ", "SD", "NP"), NP = rep("northern pikeminnow|NORTHERN PIKEMINNOW|np|NP|npm|NPM", 4), OM = rep("steelhead|STEELHEAD|rainbow trout|RAINBOW TROUT|st|ST|rb|RB|om|OM", 4), RSS = rep("redside shiner|REDSIDE SHINER|rs|RS|rss|RSS", 4), suck = rep("suckers|SUCKERS|sucker|SUCKER|suck|SUCK|su|SU|ss|SS", 4) ) I need to use

Select rows before a filtered row using dplyr

て烟熏妆下的殇ゞ 提交于 2019-12-11 02:03:39
问题 I'm working on a study where we used a camera placed inside a nest box to determine when our study species laid its first egg. Some of the cameras weren't super reliable, and I'd like to see if there were continuous photos before the date where the first egg was laid. This way I can no for sure that this is the first egg date. There are >165,000 photos and >200 nests, so I grouped by nest box ID, filtered the rows down to those that have at least 1 egg, and then used the slice function to

(R, dplyr) select multiple columns starts with same string and summarise mean (90% CI) by group

断了今生、忘了曾经 提交于 2019-12-11 01:40:20
问题 I am new to tidyverse, conceptually I would like to calculate mean and 90% CI of all columns starts with "ab", grouped by "case". Tried many ways but none seem to work, my actual data has many columns so explicitly list them out is not an option. test data below library(tidyverse) dat <- tibble(case= c("case1", "case1", "case2", "case2", "case3"), abc = c(1, 2, 3, 1, 2), abe = c(1, 3, 2, 3, 4), bca = c(1, 6, 3, 8, 9)) below code is what i would like to do conceptually, but doesn't work,

lapply() output as a dataframe of multiple functions - R

China☆狼群 提交于 2019-12-11 00:47:08
问题 I have been trying to create a new dataframe from several computations with lapply() . I have reached this so far reading several questions (1, 2, 3): lapply(mtcars, function(x) c(colnames(x), NROW(unique(x)), sum(is.na(x)), round(sum(is.na(x))/NROW(x),2) ) ) However, colnames(x) doesn't give the colname as x it's a vector. Second, I can't figure out a way to transform this output into a dataframe: lapply(mtcars, function(x) data.frame(NROW(unique(x)), # if I put colnames(x) here it gives an

How to use ggplot to plot T-SNE clustering

百般思念 提交于 2019-12-11 00:44:04
问题 Here is the t-SNE code using IRIS data: library(Rtsne) iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) set.seed(42) # Set a seed if you want reproducible results tsne_out <- Rtsne(iris_matrix) # Run TSNE # Show the objects in the 2D tsne representation plot(tsne_out$Y,col=iris_unique$Species) Which produces this plot: How can I use GGPLOT to make that figure? 回答1: I think the easiest/cleanest ggplot way would be to store all the info you need in a

Modify certain values in a data frame by indirect reference to the columns

*爱你&永不变心* 提交于 2019-12-11 00:34:46
问题 I'm wrangling some data where we sort fails into bins and compute limited yields for each sort bin by lot. I have a meta table that describes the sort bins. The rows are arranged in ascending test order and some of the sort labels come in with non-syntactic names. sort_tbl <- tibble::tribble(~weight, ~label, 0, "fail A", 0, "fail B", 0, "fail C", 100, "pass") > sort_tbl # A tibble: 4 x 2 weight label <dbl> <chr> 1 0 fail A 2 0 fail B 3 0 fail C 4 100 pass I have a data table of limited yield

extract unique combinations of subset of parameters from tidy data

跟風遠走 提交于 2019-12-10 23:38:11
问题 Consider the following dummy data set library(plyr) dummy_model <- function(...){ data.frame(x = rnorm(100), y = rnorm(100)) } params <- expand.grid(a=1:10, b=letters[1:4]) d <- mdply(params, dummy_model) str(d) # 'data.frame': 4000 obs. of 4 variables: # $ a: int 1 1 1 1 1 1 1 1 1 1 ... # $ b: chr "a" "a" "a" "a" ... # $ x: num 0.812 1.183 2.839 -0.928 -1.427 ... # $ y: num -0.796 0.137 0.976 1.118 0.4 ... Given the data d , how can I get back the original params? My current strategy would

getting constant text size while using atop function in r

蓝咒 提交于 2019-12-10 16:46:51
问题 Below is a much simpler example of a complicated custom function I have written. In the full-length form of this function, "layer1" corresponds to caption entered by the user, "layer2" corresponds to results from a statistical test, and "layer3" corresponds to details about the statistical test carried out. But when all three layers are included in the caption, it looks something like this- library(ggplot2) ggplot(iris, aes(Species, Sepal.Length)) + geom_boxplot() + labs(caption = substitute

Use Dplyr::Bind_Rows and Purrr to Selectively Bind Different Dataframes In a List of Dataframes

旧时模样 提交于 2019-12-10 16:14:59
问题 library(tidyverse) I'm attempting to use tidyverse tools to selectively bind a list of dataframes using dplyr::bind_rows(). I'll split the mtcars dataset to create a basic reproduction of my real data. Df<-mtcars%>% split(.$carb)%>% head() I can bind it together with bind_rows()... Df<-Df%>% bind_rows() But how do I selectively bind elements of the list. What I want to do is create two lists - the first binds list elements 1,3,6 while the second binds 2,4,8. I'm thinking something like... Df<