subset

How can I select a row by row name in a subsetted data frame in R?

那年仲夏 提交于 2019-12-01 23:08:42
I want to select rows by name in a data frame that is a subset of a larger one. The subsetted data frame appears to have retained the names of the original data frame, such that: > DFsubset[1:3,] x1 x2 x3 271 3 5 2 553 2 4 1 563 2 5 3 while using the printed row name returns the following: > DFsubset[271,] Error in xj[i, , drop = FALSE] : subscript out of bounds How can I select these rows based on the row names from the original DF, ie. 271, 553, 563? You need to reference the rownames of your data.frame: dfsub[rownames(dfsub) == 271,] #where dfsub is your subsetted data.frame EDIT: as

Subset data with dynamic conditions in R

纵然是瞬间 提交于 2019-12-01 23:07:36
I have a dataset of 2500 rows which are all bank loans. Each bank loan has an outstanding amount and collateral type. (Real estate, Machine tools.. etc) I need to draw a random selection out of this dataset where for example the sum of outstanding amount = 2.5Million +-5% and maximum 25% loans with the same asset class. I found the function optim, but this asks for a function and looks to be constructed for optimization a portfolio of stocks, which is much more complex. I would say that there is an easy way of achieving this? I created a sample data set which could illustrate my question

How to subtract a complete character vector with repeated characters from the other vector in R

时光毁灭记忆、已成空白 提交于 2019-12-01 21:36:56
问题 I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D") . It also means length(xNew)=length(x) - length(y) x <- c("A","A","C","A","B","B","B","B","D","E") y <- c("A","B","B","B","E") setdiff doesn't work because xNew <- setdiff(x,y) xNew [1] "C" "D" match also doesn't work xNew <- x[-match(y,x)] xNew [1] "A" "C" "A" "B" "B" "B" "D" It removes "B" on the fifth position 3 times, so there are still three "B" left. Is anyone

Wrapper for a function relying on non-standard evaluation in R

馋奶兔 提交于 2019-12-01 21:28:24
I wrote a wrapper around ftable because I need to compute flat tables with frequency and percentage for many variables: mytable <- function(...) { tab <- ftable(..., exclude = NULL) prop <- prop.table(x = tab, margin = 2) * 100 bind <- cbind(as.matrix(x = tab), as.matrix(x = prop)) margin <- addmargins(A = bind, margin = 1) round(x = margin, digits = 1) } mytable(formula = wool + tension ~ breaks, data = warpbreaks) A_L A_M A_H B_L B_M B_H A_L A_M A_H B_L B_M B_H 10 0 0 1 0 0 0 0.0 0.0 11.1 0.0 0.0 0.0 12 0 1 0 0 0 0 0.0 11.1 0.0 0.0 0.0 0.0 13 0 0 0 0 0 1 0.0 0.0 0.0 0.0 0.0 11.1 14 0 0 0 1 0

Iterate through different subset of size k

耗尽温柔 提交于 2019-12-01 21:19:52
I have an array of n integers (not necessarily distinct!) and I would like to iterate over all subsets of size k. However I'd like to exclude all duplicate subsets. e.g. array = {1,2,2,3,3,3,3}, n = 7, k = 2 then the subsets I want to iterate over (each once) are: {1,2},{1,3},{2,2},{2,3},{3,3} What is an efficient algorithm for doing this? Is a recursive approach the most efficient/elegant? In case you have a language-specific answer, I'm using C++. The same (or almost the same) algorithm which is used to generated combinations of a set of unique values in lexicographical order can be used to

How to subset data for a specific column with ddply?

别说谁变了你拦得住时间么 提交于 2019-12-01 21:15:39
I would like to know if there is a simple way to achieve what I describe below using ddply . My data frame describes an experiment with two conditions. Participants had to select between options A and B , and we recorded how long they took to decide, and whether their responses were accurate or not. I use ddply to create averages by condition. The column nAccurate summarizes the number of accurate responses in each condition. I also want to know how much time they took to decide and express it in the column RT . However, I want to calculate average response times only when participants got the

why use \\0 to include highEndPoint as part of the sublist

元气小坏坏 提交于 2019-12-01 21:10:34
I saw the code below from java tutorial oracle. In order to count the number of words between doorbell (inclusive) and pickle (inclusive), the author added \0 after the word pickle . I understand that the effect of adding \0 after pickle , is that the word pickle is now included as part of the subset. But my question is, why use \0 ? Could someone please help me out? Thanks in advance for any help! SortedSet<String> dictionary = new TreeSet<>(entire collection of words from a dictionary); int count = dictionary.subSet("doorbell", "pickle\0").size(); System.out.println(count); Edit: Also, what

How to subtract a complete character vector with repeated characters from the other vector in R

血红的双手。 提交于 2019-12-01 20:40:31
I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D") . It also means length(xNew)=length(x) - length(y) x <- c("A","A","C","A","B","B","B","B","D","E") y <- c("A","B","B","B","E") setdiff doesn't work because xNew <- setdiff(x,y) xNew [1] "C" "D" match also doesn't work xNew <- x[-match(y,x)] xNew [1] "A" "C" "A" "B" "B" "B" "D" It removes "B" on the fifth position 3 times, so there are still three "B" left. Is anyone know how to do this, is there a function available in R or we should write a private function? Thanks a

Remove rows based on factor-levels

血红的双手。 提交于 2019-12-01 20:18:20
问题 I have a data.frame df in format "long". df <- data.frame(site = rep(c("A","B","C"), 1, 7), time = c(11,11,11,22,22,22,33), value = ceiling(rnorm(7)*10)) df <- df[order(df$site), ] df site time value 1 A 11 12 2 A 22 -24 3 A 33 -30 4 B 11 3 5 B 22 16 6 C 11 3 7 C 22 9 Question How do I remove the rows where an unique element of df$time is not present for each of the levels of df$site ? In this case I want to remove df[3,] , because for df$time the timestamp 33 is only present for site A and

Why subset doesn't mind missing subset argument for dataframes?

淺唱寂寞╮ 提交于 2019-12-01 20:15:22
Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from. Let numbers <- c(1, 2, 3) frame <- as.data.frame(numbers) If I type subset(numbers, ) (so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should): Error in subset.default(numbers, ) : argument "subset" is missing, with no default However when I type subset(frame,) (so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe. What is going