subset

Error when subsetting based on adjusted values of different data frame in R

对着背影说爱祢 提交于 2019-12-25 17:08:32
问题 I am asking a side-question about the method I learned here from @redmode : Subsetting based on values of a different data frame in R When I try to dynamically adjust the level I want to subset by: N <- nrow(A) cond <- sapply(3:N, function(i) sum(A[i,] > 0.95*B[i,])==2) rbind(A[1:2,], subset(A[3:N,], cond)) I get an error Error in FUN(left, right) : non-numeric argument to binary operator. Can you think of a way I can get rows pertaining to values in A that are greater than 95% of the value

How can I specify columns in R to be used in matches (without listing each individually)?

删除回忆录丶 提交于 2019-12-25 16:42:38
问题 Suppose I have three columns of data ( sample1 , sample2 , and sample3 ). I want all of the rows in which the letter b or h appears in any one of the columns. This works fine: data <- data.frame(row_name=c("s1_100","s1_200", "s2_300", "s1_400", "s1_500"), sample1=rep("a",5), sample2=c(rep("b",2),rep("a",3)), sample3=c(rep("a",4),"h") ) data # row_name sample1 sample2 sample3 # s1_100 a b a # s1_200 a b a # s1_300 a a a # s1_400 a a a # s1_500 a a h bh <- c('b','h') bh_data <- subset(data, (

Function to create a new dataframe from data subsets

久未见 提交于 2019-12-25 11:53:22
问题 I have a large data frame data with a number of vehicles and their geo spatial location.I was able to run a loop to subset the data for each vehicle id using the following code. uniq <- unique(unlist(data$vehicleid)) for (i in 1:length(uniq)){ data_1 <- subset(data, vehicleid == uniq[i]) #your desired function } I need to write a function so that I can extract the first row of each subset and get all the extracted rows in a new separate data frame. How do I do that? 回答1: Consider the often

Unix/Linux: Convert fixed-width file to csv and subset on two columns

独自空忆成欢 提交于 2019-12-25 09:59:32
问题 I have this fixed-width file with the widths being 34, 2, 3, 2, 1, 1, 3, 1, 2, 1, 2, 2 and 75 which I want to (a) convert to delimited (csv) format and then (b) subset according to V2="03" and V5="1". I have figured out the first step: awk -v FIELDWIDTHS='34 2 3 2 1 1 3 1 2 1 2 2 75' -v OFS=',' '{ $1=$1 ""; print }' </filepath/Parse.txt > /filepath/Parse.csv But I am stumped at step 2. 回答1: Try with: awk -v FIELDWIDTHS='...' -v OFS=',' '($2=="03") && ($5=="1"){ $1=$1 ""; print }' 来源: https:/

Unix/Linux: Convert fixed-width file to csv and subset on two columns

Deadly 提交于 2019-12-25 09:58:48
问题 I have this fixed-width file with the widths being 34, 2, 3, 2, 1, 1, 3, 1, 2, 1, 2, 2 and 75 which I want to (a) convert to delimited (csv) format and then (b) subset according to V2="03" and V5="1". I have figured out the first step: awk -v FIELDWIDTHS='34 2 3 2 1 1 3 1 2 1 2 2 75' -v OFS=',' '{ $1=$1 ""; print }' </filepath/Parse.txt > /filepath/Parse.csv But I am stumped at step 2. 回答1: Try with: awk -v FIELDWIDTHS='...' -v OFS=',' '($2=="03") && ($5=="1"){ $1=$1 ""; print }' 来源: https:/

Dynamic data frame creation in R with custom names

≡放荡痞女 提交于 2019-12-25 02:27:08
问题 I'd like to create data frames dynamically and assign custom names to it. I have a master data set like this: ID grp val1 val2 1 a 32 9 1 b 21 31 1 c 43 76 2 a 23 67 2 b 5 45 2 c 65 76 3 a 43 34 3 b 43 7 3 c 12 87 4 a 43 35 4 b 65 87 4 c 21 55 I'd like to create data frames like data1: ID grp val1 val2 1 a 32 9 1 b 21 31 1 c 43 76 data2: ID grp val1 val2 2 a 23 67 2 b 5 45 2 c 65 76 and so on... I have tried some things like: myID<-1:4 df <- paste('data',myID, sep ='') ll <- sapply(df,

Subset Dataframe and plot with ggplot? [duplicate]

江枫思渺然 提交于 2019-12-25 02:24:06
问题 This question already has answers here : How to sort a dataframe by multiple column(s) (19 answers) Closed last year . I created a shiny app and need some help with the subset of my data. I insert a dateRangeInput where the client can filter between a start and end date. This filter is included into my ggplot code, so that the plot always automatically changes when a different date is selected. My problem is it does not filter based on the selected date, the data of partC . The problem is

Subseting dataframe with multiple conditions

旧城冷巷雨未停 提交于 2019-12-25 02:17:02
问题 Say I have a dataframe ARAP with columns called CoCd and VendorNo . I want to subset into another dataframe called EMIU_EMIJ all lines for combinations of: CoCd="EMIJ" & VendorNo = "100010" or CoCd="EMIU" & VendorNo = "2000001" or CoCd="EMIU" & VendorNo = "2000006". How do I combine & and | to select the lines where both combinations are met ? I.e. it needs to pair the CoCd and VendorNo combinations together. I tried EMIU_EMIJ<-subset(ARAP,CoCd=="EMIJ"&VendorNo=="100010"| CoCd=="EMIU"

Improving data.table subsetting performance

时光怂恿深爱的人放手 提交于 2019-12-25 01:08:52
问题 I am running a large monte-carlo simulation, and I discovered that sub-setting/searching my data is the slowest part of my code. In order to test some alternatives, I bench-marked performance with dataframes, data.table, and a matrix. Here is the benchmark code: library(data.table) #install.packages('profvis') library(profvis) x.df = data.frame(a=sample(1:10,10000,replace=T), b=sample(1:10,10000,replace=T)) # set up a dataframe x.dt = as.data.table(x.df) # a data.table setkey(x.dt,a) # set

How to remove all cells which contain supersets of other cells?

荒凉一梦 提交于 2019-12-24 22:25:22
问题 I am working in text mining. I have 23 sentences that I have extracted from a text file along with 6 frequent words extracted from the same text file. For frequent words, I created 1D array which shows words and in which sentences they occur. After that I took the intersection to show which word occurs with which each of other remaining words in sentence: OccursTogether = cell(length(Out1)); for ii=1:length(Out1) for jj=ii+1:length(Out1) OccursTogether{ii,jj} = intersect(Out1{ii},Out1{jj});