lapply | 易学教程

Is there an easy way to tell if many data frames stored in one list contain the same columns?

阅读更多关于 Is there an easy way to tell if many data frames stored in one list contain the same columns?

问题 I have a list containing many data frames: df1 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5]) df2 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5]) df3 <- data.frame(A = 1:5, C = LETTERS[1:5]) my_list <- list(df1, df2, df3) I want to know if every data frame in this list contains the same columns (i.e., the same number of columns, all having the same names and in the same order). I know that you can easily find column names of data frames in a list using lapply : lapply(my_list, colnames)

Fastest way to convert a list of character vectors to numeric in R

阅读更多关于 Fastest way to convert a list of character vectors to numeric in R

问题 In R , what is the fastest way to convert a list containing suites of character numbers (as character vectors) into numeric? With the following dummy data: set.seed(2) N = 1e7 ncol = 10 myT = formatC(matrix(runif(N), ncol = ncol)) # A matrix converted to characters # Each row is collapsed into a single suite of characters: myT = apply(myT, 1, function(x) paste(x, collapse=' ') ) head(myT) Producing: [1] "0.1849 0.855 0.8272 0.5403 0.3891 0.5184 0.7776 0.5533 0.1566 0.01591" [2] "0.7024 0.1008

Creating a dataframe from an lapply function with different numbers of rows

阅读更多关于 Creating a dataframe from an lapply function with different numbers of rows

问题 I have a list of dates (df2) and a separate data frame with weekly dates and a measurement on that day (df1). What I need is to output a data frame within a year prior to the sample dates (df2) and the measurements with this. eg1 <- data.frame(Date=seq(as.Date("2008-12-30"), as.Date("2012-01-04"), by="weeks")) eg2 <- as.data.frame(matrix(sample(0:1000, 79*2, replace=TRUE), ncol=1)) df1 <- cbind(eg1,eg2) df2 <- as.Date(c("2011-07-04","2010-07-28")) A similar question I have previously asked

Combining lapply, svyby, svyratio to calculate many ratios with confidence intervals

阅读更多关于 Combining lapply, svyby, svyratio to calculate many ratios with confidence intervals

问题 I am using the survey package in R to work with the U.S. Census' PUMS dataset for population. I've created a Boolean for each broad industry and a character variable MigrationStatus with three values ( Stayed , Left , Entered ). I'd like to examine the ratios of workers in each industry by migration status. This works fine: AGR_ratio=svyby(~JobAGR, by=~MigrationStatus, denominator=~EmployedAtWork, design=subset(pums_design,EmployedAtWork==1), svyratio, vartype='ci') But this produces an error

Data table - apply the same function on several columns to create new data table columns

阅读更多关于 Data table - apply the same function on several columns to create new data table columns

问题 I am working with data.table package. I have a data table which represents users actions on a website. Let's say that every user can visit a website, and perform multiple actions on it. My original data table is of actions (every row is an action) and I want to aggregate this information into a new data table, grouped by user visits (every visit has a unique ID). There are some fields which are shared by the actions of the same visit - for example - the user name, the user status, the visit

Collapsing factor level for all the factor variable in dataframe based on the count

阅读更多关于 Collapsing factor level for all the factor variable in dataframe based on the count

问题 I would like to keep only the top 2 factor levels based on the frequency and group all other factors into Other. I tried this but it doesnt help. df=data.frame(a=as.factor(c(rep('D',3),rep('B',5),rep('C',2))), b=as.factor(c(rep('A',5),rep('B',5))), c=as.factor(c(rep('A',3),rep('B',5),rep('C',2)))) myfun=function(x){ if(is.factor(x)){ levels(x)[!levels(x) %in% names(sort(table(x),decreasing = T)[1:2])]='Others' } } df=as.data.frame(lapply(df, myfun)) Expected Output a b c D A A D A A D A A B A

how to get name of data.frame from list passed to function using lapply

阅读更多关于 how to get name of data.frame from list passed to function using lapply

问题 I have function which I want to extend with ability to save results to csv file. The name of csv file should be generated based on data.frame name passed to this function: my.func1 <- function(dframe, ...){ # PART OF CODE RESPONSIBLE FOR COMPUTATION # ... # PART OF CODE WHERE I WANT TO STORE RESULTS AS CSV csv <- deparse(substitute(dframe)) csv } When I call this function following way then the name of dataset passed to this function is interpreted correctly: > my.func1(mtcars) [1] "mtcars"

Pad each element in a list to specific length in R

阅读更多关于 Pad each element in a list to specific length in R

问题 Here is a simple r question which basically pertains to correctly understanding list syntax I think. I have a series of matrices loaded into a list (following some preliminary calculations) which I then want to conduct some basic block averaging on. My basic workflow will be as follows: 1) Rounding each vector contained within a list to an integer corresponding to the number of blocks I am interested in averaging to. 2) Padding each vector in a list to this new length. 3) Conversion of each

Working with dataframes in a list: Drop variables, add new ones

阅读更多关于 Working with dataframes in a list: Drop variables, add new ones

问题 Define a list dats with two dataframes, df1 and df2 dats <- list( df1 = data.frame(a=sample(1:3), b = sample(11:13)), df2 = data.frame(a=sample(1:3), b = sample(11:13))) > dats $df1 a b 1 2 12 2 3 11 3 1 13 $df2 a b 1 3 13 2 2 11 3 1 12 I would like to drop variable a in each data frame. Next I would like to add a variable with the id of each dataframe from an external dataframe, like: ids <- data.frame(id=c("id1","id2"),df=c("df1","df2")) > ids id df 1 id1 df1 2 id2 df2 To drop unnecessary

Counting the number of rows of a series of csv files

阅读更多关于 Counting the number of rows of a series of csv files

问题 I'm working through an R tutorial and suspect that I have to use one of these functions but I'm not sure which (Yes I researched them but until I become more fluent in R terminology they are quite confusing). In my working directory there is a folder "specdata". Specdata contains hundreds of CSV files named 001.csv - 300.csv. The function I am working on must count the total number of rows for an inputed number of csv files. So if the argument in the function is 1:10 and each of those files