subset | 易学教程

r functions calling lm with subsets

阅读更多关于 r functions calling lm with subsets

问题 I was working on some code and I noticed something peculiar. When I run LM on a subset of some panel data I have it works fine, something like this: library('plm') data(Cigar) lm(log(price) ~ log(pop) + log(ndi), data=Cigar, subset=Cigar$state==1) Call: lm(formula = log(price) ~ log(pop) + log(ndi), data = Cigar, subset = Cigar$state == 1) Coefficients: (Intercept) log(pop) log(ndi) -26.4919 3.2749 0.4265 but when I try to wrap this in a function I get: myfunction <- function(formula, data,

r functions calling lm with subsets

阅读更多关于 r functions calling lm with subsets

Subsetting a dataset by selecting variables based on keywords in their name in SAS

阅读更多关于 Subsetting a dataset by selecting variables based on keywords in their name in SAS

问题 I hope someone can help. I have a large dataset imported to SAS with thousands of variables. I want to create a new dataset by extracting variables that have a specific keyword in their name. For example, the following variables are in my dataset: AAYAN_KK_Equity_Ask AAYAN_KK_Equity_Bid AAYAN_KK_Equity_Close AAYAN_KK_Equity_Date AAYAN_KK_Equity_Volume AAYANRE_KK_Equity_Ask AAYANRE_KK_Equity_Bid AAYANRE_KK_Equity_Close AAYANRE_KK_Equity_Date I want to extract variables that end with _Ask and

R: selecting subset without copying

阅读更多关于 R: selecting subset without copying

问题 Is there a way to select a subset from objects (data frames, matrices, vectors) without making a copy of selected data? I work with quite large data sets, but never change them. However often for convenience I select subsets of the data to operate on. Making a copy of a large subset each time is very memory inefficient, but both normal indexing and subset (and thus xapply() family of functions) create copies of selected data. So I'm looking for functions or data structures that can overcome

R: Why is the [[ ]] approach for subsetting a list faster than using $?

阅读更多关于 R: Why is the [[ ]] approach for subsetting a list faster than using $?

问题 I've been working on a few projects that have required me to do a lot of list subsetting and while profiling code I realised that the object[["nameHere"]] approach to subsetting lists was usually faster than the object$nameHere approach. As an example if we create a list with named components: a.long.list <- as.list(rep(1:1000)) names(a.long.list) <- paste0("something",1:1000) Why is this: system.time ( for (i in 1:10000) { a.long.list[["something997"]] } ) user system elapsed 0.15 0.00 0.16

Undefined columns selected when subsetting data frame

阅读更多关于 Undefined columns selected when subsetting data frame

问题 I have a data frame, str(data) to show more about my data frame the result is the following: > str(data) 'data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day : int 1 2 3 4 5 6 7 8 9 10 ... However, for example, when I want to subset the amounts of Ozone

Update subset of data.table based on join

阅读更多关于 Update subset of data.table based on join

问题 I have two data tables, DT1 and DT2: set.seed(1) DT1<-data.table(id1=rep(1:3,2),id2=sample(letters,6), v1=rnorm(6), key="id2") DT1 ## id1 id2 v1 ## 1: 2 e 0.7383247 ## 2: 1 g 1.5952808 ## 3: 2 j 0.3295078 ## 4: 3 n -0.8204684 ## 5: 3 s 0.5757814 ## 6: 1 u 0.4874291 DT2<-data.table(id2=c("n","u"), v1=0, key="id2") DT2 ## id2 v1 ## 1: n 0 ## 2: u 0 I would like to update DT1 based on a join with DT2, but only for a subset of DT1. For example, for DT1[id1==3] , I would expect the value of v1 in

Subsetting and Merging from 2 Related Data Frames in r

阅读更多关于 Subsetting and Merging from 2 Related Data Frames in r

问题 I have searched through the archives and to no avail on this problem I have involving the subsetting of 2 related data frames, one data frame is a key, the other is an annual list, I'd like to use the key to create a subset and an index. I have tried using the subset formula's but my code is not appropriately meeting my criteria. Here is the data: players <- c('Albert Belle','Reggie Jackson', 'Reggie Jackson') contract_start_season <- c(1999,1977,1982) contract_end_season <- c(2003, 1981,

How to create a new variable with values from different variables if another variable equals a set value in R?

阅读更多关于 How to create a new variable with values from different variables if another variable equals a set value in R?

问题 I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables: df$Id <- c(1:12) df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a) df$va <- c(1.1, 1.4, 2.5, ...) #12 randoms values df$vb <- c(5.9, 2.3, 4.7, ...) #12 other random values df$vc <- c(3.0, 3.3, 3.7, ...) #12 more random values Then I want to create a new variable that takes the value from va, vb, or vc if the date is equal to a, b, or c. I had tried a nested if-else, which did not work. I also

How to subset the randomly repeating X and Y address with values

阅读更多关于 How to subset the randomly repeating X and Y address with values

问题 I have data frame with 3 columns and more than 200000 rows. The first 2 columns are the x and y address of 3 column (values) and each address is repeating 365 times with different values. I have to extract each x,y address with it 365 values saperately. X Y Value 3297 33.625184 70.875 0.04 3298 33.875184 70.875 0.02 3299 34.125184 70.875 0.01 3300 34.375184 70.875 0.03 3301 34.625184 70.875 0.09 3302 34.875184 70.875 0.14 3303 35.125184 70.875 0.17 3304 35.375184 70.875 0.12 3305 35.625184 70