subset | 易学教程

Making a list of dataframes which are a subset of one dataframe using R

阅读更多关于 Making a list of dataframes which are a subset of one dataframe using R

问题 I have one dataframe, and I would like to make a list of dataframes, each of which is a subset of this original data, based on the value of one variable. I have found this answer on CrossValidated https://stats.stackexchange.com/questions/161414/creating-subsets-of-dataframes-from-a-single-dataframe-based-on-the-distinct-val It looks like he's applying a different function to create his new dataframes than I would like to, and I'm not sure how to make this work for what I would like to do.

Subset rows based on “start and stop” strings

阅读更多关于 Subset rows based on “start and stop” strings

问题 looking to write an R script that will search a column for a specific value and begin sub setting rows until a specific text value is reached. Example: X1 X2 [1,] "a" "1" [2,] "b" "2" [3,] "c" "3" [4,] "d" "4" [5,] "e" "5" [6,] "f" "6" [7,] "c" "7" [8,] "k" "8" What I'd like to do is search through X1 until the letter 'c' is found, and begin to subset rows until another letter 'c' is found, at which point the subset procedure would stop. Using the above example, the result should be a vector

Extract all rows containing first value for each unique value of another column

阅读更多关于 Extract all rows containing first value for each unique value of another column

问题 I am looking for something similar to this Select only the first rows for each unique value of a column in R but I need to keep ALL rows containing the first values of year per ID. In ither words, I need to subset the dataset on the first year listed, by individual ID. IDs can have their first year in 1 2 or 3, and all of the rows in the first year should be retained. For example: ID <- c("54V", "54V", "54V", "54V", "56V", "56V", "56V", "59V", "59V", "59V") yr <- c(1, 1, 1, 2, 2, 2, 3, 1, 2,

Selecting rows with same result in different columns in R

阅读更多关于 Selecting rows with same result in different columns in R

问题 I would like to select in my dataframe (catch) only the rows for which my "tspp.name" variable is the same as my "elasmo.name" variable. For example, row #74807 and #74809 in this case would be selected, but not row #74823 because the elasmo.name is "skate" and the tspp.name is "Northern shrimp". I am sure there is an easy answer for this, but I have not found it yet. Any hints would be appreciated. > catch[4:6,] gear tripID obsID sortie setID date time NAFO lat long dur depth bodymesh 74807

R subset functions, including '[' not working on middle range of large dataframe/matrix

阅读更多关于 R subset functions, including '[' not working on middle range of large dataframe/matrix

问题 I'm having a strange issue where I am looping over a large data frame to create a 3D barplot from the data in 2 columns, where the Z axis is the frequency. The original data frame looks like this (please excuse excess columns): > head(MergedBH) Row.names V1.x V2.x V3.x V4.x V5.x RFL_Contig1 RFL_Contig1 RFL_Contig1 Scaffold3494078 1.00 1.000 470 RFL_Contig100 RFL_Contig100 RFL_Contig100 Scaffold2661063 0.61 0.975 236 RFL_Contig1000 RFL_Contig1000 RFL_Contig1000 Scaffold861300 0.96 0.995 451

Passing empty index in R

阅读更多关于 Passing empty index in R

问题 Say I want to subset a vector a , I can pass the value of the indices to subset in a variable e.g. a[idx] . What value should I set idx to get the equivalent of getting the whole a ( i.e. a[] ) ? Basically I have a function with idx as the argument, and would like to pass a value to process the whole dataset. I'm assuming there should be something better than 1:length(a) . 回答1: The index argument in subsetting is allowed to be "missing" (see ?"[" ): ff1 = function(x, i) x[i] ff2 = function(x,

Subsetting a string based on pre- and suffix

阅读更多关于 Subsetting a string based on pre- and suffix

问题 I have a column with these type of names: sp_O00168_PLM_HUMAM sp_Q8N1D5_CA158_HUMAN sp_Q15818_NPTX1_HUMAN tr_Q6FGH5_Q6FGH5_HUMAN sp_Q9UJ99_CAD22_HUMAN I want to remove everything before, and including, the second _ and everything after, and including, the third _. I do not which to remove based on number of characters, since this is not a fixed number. The output should be: PLM CA158 NPTX1 Q6FGH5 CAD22 I have played around with these, but don't quite get it right.. library(stringer) str_sub(x

Python: subset elements in one list based on substring in another list, retain only one element per substring

阅读更多关于 Python: subset elements in one list based on substring in another list, retain only one element per substring

问题 I have two lists: list1 = ['abc-21-6/7', 'abc-56-9/10', 'def-89-7/3', 'hij-2-4/9', 'hij-75-1/7'] list2 = ['abc', 'hij'] I would like to subset list1 such that: 1) only those elements with substrings matching an element in list2 are retained, and 2) for duplicated elements that meet the first requirement, I want to randomly retain only one of the duplicates. For this specific example, I would like to produce a result such as: ['abc-21-6/7', 'hij-75-1/7'] I have worked out code to meet my first

Average a subset of a matrix in a loop in matlab

阅读更多关于 Average a subset of a matrix in a loop in matlab

问题 I work with an image that I consider as a matrix. I want to turn a 800 x 800 matrix (A) into a 400 x 400 matrix (B) where the mean of 4 cells of the A matrix = 1 cell of the B matrix (I know this not a right code line) : B[1,1] =mean2(A[1,1 + 1,2 + 2,1 + 2,2]) and so on for the whole matrix ... B [1,2]=mean2(A[1,3 + 1,4 + 2,3 + 2,4 ]) I thought to : 1) Reshape the A matrix into a 2 x 320 000 matrix so I get the four cells I need to average next to each other and it is easier to deal with the

Group-wise conditional subsetting where feasible

阅读更多关于 Group-wise conditional subsetting where feasible

问题 I would like to subset rows of my data library(data.table); set.seed(333); n <- 100 dat <- data.table(id=1:n, group=rep(1:2,each=n/2), x=runif(n,100,120), y=runif(n,200,220), z=runif(n,300,320)) > head(dat) id group x y z 1: 1 1 109.3400 208.6732 308.7595 2: 2 1 101.6920 201.0989 310.1080 3: 3 1 119.4697 217.8550 313.9384 4: 4 1 111.4261 205.2945 317.3651 5: 5 1 100.4024 212.2826 305.1375 6: 6 1 114.4711 203.6988 319.4913 in several stages, unless it results in an empty subset. In this case,