subset | 易学教程

Subset matrix with arrays in r

阅读更多关于 Subset matrix with arrays in r

问题 It is probably fairly basic but I have not found an easy solution. Assume I have a three-dimensional matrix: m <- array(seq_len(18),dim=c(3,3,2)) and I would like to subset the matrix with the arrays of indexes: idxrows <- c(1,2,3) idxcols <- c(1,1,2) obtaining the arrays in position (1,1) , (2,1) and (3,2) , that is: [,1] [,2] [,3] [1,] 1 5 9 [2,] 10 14 18 I have tried m[idxrows,idxcols,] but without any luck. Is there anyway to do it (without obviously using a for loop)? 回答1: Not sure if

How to select range of columns in a dataframe based on their name and not their indexes?

阅读更多关于 How to select range of columns in a dataframe based on their name and not their indexes?

问题 In a pandas dataframe created like this: import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(10, size=(6, 6)), columns=['c' + str(i) for i in range(6)], index=["r" + str(i) for i in range(6)]) which could look as follows: c0 c1 c2 c3 c4 c5 r0 2 7 3 3 2 8 r1 6 9 6 7 9 1 r2 4 0 9 8 4 2 r3 9 0 4 3 5 4 r4 7 6 8 8 0 8 r5 0 6 1 8 2 2 I can easily select certain rows and/or a range of columns using .loc : print df.loc[['r1', 'r5'], 'c1':'c4'] That would return: c1 c2 c3 c4 r1

How to extract the previous n rows where a certain column value cannot be a particular value?

阅读更多关于 How to extract the previous n rows where a certain column value cannot be a particular value?

问题 I've been searching for quite some time now with no luck. Essentially, I'm trying to figure out a way in R to extract the previous n rows where the "LTO Column" is a 0 but starting from where the "LTO Column" is a 1. Data table: Week Price LTO 1/1/2019 11 0 2/1/2019 12 0 3/1/2019 11 0 4/1/2019 11 0 5/1/2019 9.5 1 6/1/2019 10 0 7/1/2019 8 1 Then what I'm trying to do is say if n = 3, starting from 5/1/2019 where LTO = 1. I want to be able to pull the rows 4/1/2019, 3/1/2019. 2/1/2019. But then

R - Replace values starting in selected column by row

阅读更多关于 R - Replace values starting in selected column by row

问题 I want to replace with zero the monthly values that are after a specific month by row. I have tried adapting Replace NA values in dataframe starting in varying columns without success. Given data: df <- structure(list(Mth1 = c(1L, 3L, 4L, 1L, 2L), Mth2 = c(2L, 3L, 2L, 2L, 2L), Mth3 = c(1L, 2L, 1L, 2L, 3L), Mth4 = c(3L, 1L, 3L, 4L, 2L), ZeroMth = c(1L, 3L, 2L, 4L, 3L)), .Names = c("Mth1", "Mth2", "Mth3", "Mth4", "ZeroMth"), class = "data.frame", row.names = c("1", "2", "3", "4", "5")) > df

(R) [] / subset() returns an empty data frame

阅读更多关于 (R) [] / subset() returns an empty data frame

问题 I have a large dataset that looks something like this with a few hundred thousand more entries, saved as data : Group1 dtm_Flight_Date Departure Arrival str_Fare_Category_Ident 1 8P104 06/11/2010 9:05 YYJ YVR B 2 8P104 06/11/2010 9:05 YYJ YVR K 3 8P104 06/11/2010 9:05 YYJ YVR L 4 8P104 06/11/2010 9:05 YYJ YVR N 5 8P104 06/11/2010 9:05 YYJ YVR Q 6 8P104 06/11/2010 9:05 YYJ YVR Y 7 8P104 6/14/2010 9:05:00 AM YYJ YVR B 8 8P104 6/14/2010 9:05:00 AM YYJ YVR K 9 8P104 6/14/2010 9:05:00 AM YYJ YVR L

square brackets multiple columns R

阅读更多关于 square brackets multiple columns R

问题 I am flummoxed. I am trying to isolate certain rows of df according to values in two columns. As always i try this in practice data first. My code works fine. data1<-df2[df2$fruit=="kiwi" | df2$fruit=="orange" | df2$fruit=="apple" & (df2$dates>= "2010-04-01" & df2$dates< "2010-10-01"), ] when I try the same code on my real data, it doesn't work. It collects the "fruits" I need, but ignores my date range request. data1<-lti_first[lti_first$hai_atc=="C10AA01" | lti_first$hai_atc=="C10AA03" |

subsetting from an R object of Gene IDs from GRanges file

阅读更多关于 subsetting from an R object of Gene IDs from GRanges file

问题 I have a GRanges file called "P.obj" where I want to extract/subset specific Gene IDs contained in the column "name". The specific Gene IDs that I want to extract are contained in the R object "plus" where the column name is also called "name" I understand how to subset by overlaps and find overlaps, but I cannot work out how to subset by gene name. > P.obj GRangesList of length 4: $exons GRanges with 604591 ranges and 2 metadata columns: seqnames ranges strand | score name <Rle> <IRanges>

Subsetting data frame using variable with same name as column

阅读更多关于 Subsetting data frame using variable with same name as column

问题 I have a data frame and I'm trying to run a subset on it. In my data frame, I have a column called "start" and I'm trying to do this: sub <- subset(data,data$start==14) and I correctly get a subset of all the rows where start=14. But, when I do this: for(start in seq(1,20,by=1)) { sub <- subset(data,data$start==start) print(sub) } it does not correctly find the subsets. It just prints the entire data frame. Why is this and how do I fix it? 回答1: You can also specify the environment you're

Splitting data and fitting distributions efficiently

阅读更多关于 Splitting data and fitting distributions efficiently

问题 For a project I have received a large amount of confidential patient level data that I need to fit a distribution to so as to use it in a simulation model. I am using R. The problem is that I need is to fit the distribution to get the shape/rate data for at least 288 separate distributions (at least 48 subsets of 6 variables). The process will vary slightly between variables (depending on how that variable is distributed) but I want to be able to set up a function or loop for each variable

How do you subset a data frame in R based on a minimum sample size

阅读更多关于 How do you subset a data frame in R based on a minimum sample size

问题 Let's say you have a data frame with two levels of factors that looks like this: Factor1 Factor2 Value A 1 0.75 A 1 0.34 A 2 1.21 A 2 0.75 A 2 0.53 B 1 0.42 B 2 0.21 B 2 0.18 B 2 1.42 etc. How do I subset this data frame ("df", if you will) based on the condition that the combination of Factor1 and Factor2 (Fact1*Fact2) has more than, say, 2 observations? Can you use the length argument in subset to do this? 回答1: Assuming your data.frame is called mydf , you can use ave to create a logical