subset | 易学教程

How can I select a row by row name in a subsetted data frame in R?

阅读更多关于 How can I select a row by row name in a subsetted data frame in R?

I want to select rows by name in a data frame that is a subset of a larger one. The subsetted data frame appears to have retained the names of the original data frame, such that: > DFsubset[1:3,] x1 x2 x3 271 3 5 2 553 2 4 1 563 2 5 3 while using the printed row name returns the following: > DFsubset[271,] Error in xj[i, , drop = FALSE] : subscript out of bounds How can I select these rows based on the row names from the original DF, ie. 271, 553, 563? You need to reference the rownames of your data.frame: dfsub[rownames(dfsub) == 271,] #where dfsub is your subsetted data.frame EDIT: as

Subset data with dynamic conditions in R

阅读更多关于 Subset data with dynamic conditions in R

I have a dataset of 2500 rows which are all bank loans. Each bank loan has an outstanding amount and collateral type. (Real estate, Machine tools.. etc) I need to draw a random selection out of this dataset where for example the sum of outstanding amount = 2.5Million +-5% and maximum 25% loans with the same asset class. I found the function optim, but this asks for a function and looks to be constructed for optimization a portfolio of stocks, which is much more complex. I would say that there is an easy way of achieving this? I created a sample data set which could illustrate my question

How to subtract a complete character vector with repeated characters from the other vector in R

阅读更多关于 How to subtract a complete character vector with repeated characters from the other vector in R

问题 I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D") . It also means length(xNew)=length(x) - length(y) x <- c("A","A","C","A","B","B","B","B","D","E") y <- c("A","B","B","B","E") setdiff doesn't work because xNew <- setdiff(x,y) xNew [1] "C" "D" match also doesn't work xNew <- x[-match(y,x)] xNew [1] "A" "C" "A" "B" "B" "B" "D" It removes "B" on the fifth position 3 times, so there are still three "B" left. Is anyone

Wrapper for a function relying on non-standard evaluation in R

阅读更多关于 Wrapper for a function relying on non-standard evaluation in R

I wrote a wrapper around ftable because I need to compute flat tables with frequency and percentage for many variables: mytable <- function(...) { tab <- ftable(..., exclude = NULL) prop <- prop.table(x = tab, margin = 2) * 100 bind <- cbind(as.matrix(x = tab), as.matrix(x = prop)) margin <- addmargins(A = bind, margin = 1) round(x = margin, digits = 1) } mytable(formula = wool + tension ~ breaks, data = warpbreaks) A_L A_M A_H B_L B_M B_H A_L A_M A_H B_L B_M B_H 10 0 0 1 0 0 0 0.0 0.0 11.1 0.0 0.0 0.0 12 0 1 0 0 0 0 0.0 11.1 0.0 0.0 0.0 0.0 13 0 0 0 0 0 1 0.0 0.0 0.0 0.0 0.0 11.1 14 0 0 0 1 0

Iterate through different subset of size k

阅读更多关于 Iterate through different subset of size k

I have an array of n integers (not necessarily distinct!) and I would like to iterate over all subsets of size k. However I'd like to exclude all duplicate subsets. e.g. array = {1,2,2,3,3,3,3}, n = 7, k = 2 then the subsets I want to iterate over (each once) are: {1,2},{1,3},{2,2},{2,3},{3,3} What is an efficient algorithm for doing this? Is a recursive approach the most efficient/elegant? In case you have a language-specific answer, I'm using C++. The same (or almost the same) algorithm which is used to generated combinations of a set of unique values in lexicographical order can be used to

How to subset data for a specific column with ddply?

阅读更多关于 How to subset data for a specific column with ddply?

I would like to know if there is a simple way to achieve what I describe below using ddply . My data frame describes an experiment with two conditions. Participants had to select between options A and B , and we recorded how long they took to decide, and whether their responses were accurate or not. I use ddply to create averages by condition. The column nAccurate summarizes the number of accurate responses in each condition. I also want to know how much time they took to decide and express it in the column RT . However, I want to calculate average response times only when participants got the

why use \\0 to include highEndPoint as part of the sublist

阅读更多关于 why use \\0 to include highEndPoint as part of the sublist

I saw the code below from java tutorial oracle. In order to count the number of words between doorbell (inclusive) and pickle (inclusive), the author added \0 after the word pickle . I understand that the effect of adding \0 after pickle , is that the word pickle is now included as part of the subset. But my question is, why use \0 ? Could someone please help me out? Thanks in advance for any help! SortedSet<String> dictionary = new TreeSet<>(entire collection of words from a dictionary); int count = dictionary.subSet("doorbell", "pickle\0").size(); System.out.println(count); Edit: Also, what

How to subtract a complete character vector with repeated characters from the other vector in R

阅读更多关于 How to subtract a complete character vector with repeated characters from the other vector in R

I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D") . It also means length(xNew)=length(x) - length(y) x <- c("A","A","C","A","B","B","B","B","D","E") y <- c("A","B","B","B","E") setdiff doesn't work because xNew <- setdiff(x,y) xNew [1] "C" "D" match also doesn't work xNew <- x[-match(y,x)] xNew [1] "A" "C" "A" "B" "B" "B" "D" It removes "B" on the fifth position 3 times, so there are still three "B" left. Is anyone know how to do this, is there a function available in R or we should write a private function? Thanks a

Remove rows based on factor-levels

阅读更多关于 Remove rows based on factor-levels

问题 I have a data.frame df in format "long". df <- data.frame(site = rep(c("A","B","C"), 1, 7), time = c(11,11,11,22,22,22,33), value = ceiling(rnorm(7)*10)) df <- df[order(df$site), ] df site time value 1 A 11 12 2 A 22 -24 3 A 33 -30 4 B 11 3 5 B 22 16 6 C 11 3 7 C 22 9 Question How do I remove the rows where an unique element of df$time is not present for each of the levels of df$site ? In this case I want to remove df[3,] , because for df$time the timestamp 33 is only present for site A and

Why subset doesn't mind missing subset argument for dataframes?

阅读更多关于 Why subset doesn't mind missing subset argument for dataframes?

Normally I wonder where mysterious errors come from but now my question is where a mysterious lack of error comes from. Let numbers <- c(1, 2, 3) frame <- as.data.frame(numbers) If I type subset(numbers, ) (so I want to take some subset but forget to specify the subset-argument of the subset function) then R reminds me (as it should): Error in subset.default(numbers, ) : argument "subset" is missing, with no default However when I type subset(frame,) (so the same thing with a data.frame instead of a vector), it doesn't give an error but instead just returns the (full) dataframe. What is going