lapply

Is there an easy way to tell if many data frames stored in one list contain the same columns?

妖精的绣舞 提交于 2019-12-10 18:49:10
问题 I have a list containing many data frames: df1 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5]) df2 <- data.frame(A = 1:5, B = 2:6, C = LETTERS[1:5]) df3 <- data.frame(A = 1:5, C = LETTERS[1:5]) my_list <- list(df1, df2, df3) I want to know if every data frame in this list contains the same columns (i.e., the same number of columns, all having the same names and in the same order). I know that you can easily find column names of data frames in a list using lapply : lapply(my_list, colnames)

Fastest way to convert a list of character vectors to numeric in R

末鹿安然 提交于 2019-12-10 17:55:41
问题 In R , what is the fastest way to convert a list containing suites of character numbers (as character vectors) into numeric? With the following dummy data: set.seed(2) N = 1e7 ncol = 10 myT = formatC(matrix(runif(N), ncol = ncol)) # A matrix converted to characters # Each row is collapsed into a single suite of characters: myT = apply(myT, 1, function(x) paste(x, collapse=' ') ) head(myT) Producing: [1] "0.1849 0.855 0.8272 0.5403 0.3891 0.5184 0.7776 0.5533 0.1566 0.01591" [2] "0.7024 0.1008

Creating a dataframe from an lapply function with different numbers of rows

南楼画角 提交于 2019-12-10 17:43:21
问题 I have a list of dates (df2) and a separate data frame with weekly dates and a measurement on that day (df1). What I need is to output a data frame within a year prior to the sample dates (df2) and the measurements with this. eg1 <- data.frame(Date=seq(as.Date("2008-12-30"), as.Date("2012-01-04"), by="weeks")) eg2 <- as.data.frame(matrix(sample(0:1000, 79*2, replace=TRUE), ncol=1)) df1 <- cbind(eg1,eg2) df2 <- as.Date(c("2011-07-04","2010-07-28")) A similar question I have previously asked

Combining lapply, svyby, svyratio to calculate many ratios with confidence intervals

怎甘沉沦 提交于 2019-12-10 16:47:26
问题 I am using the survey package in R to work with the U.S. Census' PUMS dataset for population. I've created a Boolean for each broad industry and a character variable MigrationStatus with three values ( Stayed , Left , Entered ). I'd like to examine the ratios of workers in each industry by migration status. This works fine: AGR_ratio=svyby(~JobAGR, by=~MigrationStatus, denominator=~EmployedAtWork, design=subset(pums_design,EmployedAtWork==1), svyratio, vartype='ci') But this produces an error

Data table - apply the same function on several columns to create new data table columns

前提是你 提交于 2019-12-10 16:06:08
问题 I am working with data.table package. I have a data table which represents users actions on a website. Let's say that every user can visit a website, and perform multiple actions on it. My original data table is of actions (every row is an action) and I want to aggregate this information into a new data table, grouped by user visits (every visit has a unique ID). There are some fields which are shared by the actions of the same visit - for example - the user name, the user status, the visit

Collapsing factor level for all the factor variable in dataframe based on the count

有些话、适合烂在心里 提交于 2019-12-10 15:45:36
问题 I would like to keep only the top 2 factor levels based on the frequency and group all other factors into Other. I tried this but it doesnt help. df=data.frame(a=as.factor(c(rep('D',3),rep('B',5),rep('C',2))), b=as.factor(c(rep('A',5),rep('B',5))), c=as.factor(c(rep('A',3),rep('B',5),rep('C',2)))) myfun=function(x){ if(is.factor(x)){ levels(x)[!levels(x) %in% names(sort(table(x),decreasing = T)[1:2])]='Others' } } df=as.data.frame(lapply(df, myfun)) Expected Output a b c D A A D A A D A A B A

how to get name of data.frame from list passed to function using lapply

不想你离开。 提交于 2019-12-10 14:24:10
问题 I have function which I want to extend with ability to save results to csv file. The name of csv file should be generated based on data.frame name passed to this function: my.func1 <- function(dframe, ...){ # PART OF CODE RESPONSIBLE FOR COMPUTATION # ... # PART OF CODE WHERE I WANT TO STORE RESULTS AS CSV csv <- deparse(substitute(dframe)) csv } When I call this function following way then the name of dataset passed to this function is interpreted correctly: > my.func1(mtcars) [1] "mtcars"

Pad each element in a list to specific length in R

怎甘沉沦 提交于 2019-12-10 10:12:04
问题 Here is a simple r question which basically pertains to correctly understanding list syntax I think. I have a series of matrices loaded into a list (following some preliminary calculations) which I then want to conduct some basic block averaging on. My basic workflow will be as follows: 1) Rounding each vector contained within a list to an integer corresponding to the number of blocks I am interested in averaging to. 2) Padding each vector in a list to this new length. 3) Conversion of each

Working with dataframes in a list: Drop variables, add new ones

≯℡__Kan透↙ 提交于 2019-12-10 06:08:22
问题 Define a list dats with two dataframes, df1 and df2 dats <- list( df1 = data.frame(a=sample(1:3), b = sample(11:13)), df2 = data.frame(a=sample(1:3), b = sample(11:13))) > dats $df1 a b 1 2 12 2 3 11 3 1 13 $df2 a b 1 3 13 2 2 11 3 1 12 I would like to drop variable a in each data frame. Next I would like to add a variable with the id of each dataframe from an external dataframe, like: ids <- data.frame(id=c("id1","id2"),df=c("df1","df2")) > ids id df 1 id1 df1 2 id2 df2 To drop unnecessary

Counting the number of rows of a series of csv files

本小妞迷上赌 提交于 2019-12-09 18:16:40
问题 I'm working through an R tutorial and suspect that I have to use one of these functions but I'm not sure which (Yes I researched them but until I become more fluent in R terminology they are quite confusing). In my working directory there is a folder "specdata". Specdata contains hundreds of CSV files named 001.csv - 300.csv. The function I am working on must count the total number of rows for an inputed number of csv files. So if the argument in the function is 1:10 and each of those files