Print the Nth Row in a List of Data Frames

问题

I am cleaning several excel files in R. They unfortunately are of unequal dimensions, rows and columns. Currently I am storing each excel sheet as a data frame in a list. I know how to print the 4th row of the first data frame in a list by issuing this command:

df.list1[[1]][4,]

Or a range of rows like this:

df.list1[[1]][1:10,]

My question is: How do I print a particular row for every data frame in the list? In other words:

df.list1[[i]][4,]

df.list1 has 30 data frames in it, but my other df.lists have over 140 data frames that I am looking to extract their rows. I'd like to be able to store particular locations across several data frames into a new list. I'm thinking the solution might involve lapply.

Furthermore, is there a way to extract rows in every data frame in a list based on a condition? For example, for all 30 data frames in the list df.list1, extract the row if the value is equal to "Apartment" or some other string of characters.

Appreciate your help, please let me know if I can help clarify my problem.

回答1:

You could also just directly lapply the extraction function @Justin suggests, e.g.:

# example data of a list containing 10 data frames:
test <- replicate(10,data.frame(a=1:10),simplify=FALSE)

# extract the fourth row of each one - setting drop=FALSE means you get a
# data frame returned even if only one vector/column needs to be returned.
lapply(test,"[",4,,drop=FALSE)

The format is:

lapply(listname,"[",rows.to.return,cols.to.return,drop=FALSE)

# the example returns the fourth row only from each data frame
#[[1]]
#  a
#4 4
# 
#[[2]]
#  a
#4 4
# etc...

To generalise this when you are completing an extraction based on a condition, you would have to change it up a little to something like the below example extracting all rows where a in each data.frame is >4. In this case, using an anonymous function is probably the clearest method, e.g.:

lapply(test, function(x) with(x,x[a>4,,drop=FALSE]) )

#[[1]]
#    a
#5   5
#6   6
#7   7
#8   8
#9   9
#10 10
# etc...

回答2:

There is no need for a wrapper function, just use lapply and pass it a blank argument at the end (to represent the columns)

lapply(df.list, `[`, 4, )

This also works with any type of row argument that you would normally use in myDF[ . , ] eg: lapply(df.list,[, c(2, 4:6), )

I would suggest that if you are going to use a wrapper function, have it work more like [ does: eg

Grab(df.list, 2:3, 1:5) would select the second & third row and first through 5th column of every data.frame and Grab (df.list, 2:3) would select the second & third row of all columns

Grab <- function(ll, rows, cols) {
    if (missing(cols))
        lapply(ll, `[`, rows, )
    else 
        lapply(ll, `[`, rows, cols)
}

Grab (df.list, 2:3)

回答3:

My suggestion is to write a function that does what you want on a single data frame:

myfun <- function(dat) {
  return(dat[4, , drop=FALSE])
}

If you want to return as a vector instead of data.frame, just do: return(dat[4, ]) insteaad. Then use lapply to apply that function to each element of your list:

lapply(df.list1, myfun)

With that technique, you can easily come up with ways to extend myfun to more complex functions...

回答4:

For example, you have a .csv file called hw1_data.csv and you want to retrieve the 47th row. Here is how to do that:

x<-read.csv("hw1_data.csv")

x[47,]

If it is a text file you can use read.table.

来源：https://stackoverflow.com/questions/18038863/print-the-nth-row-in-a-list-of-data-frames

标签

dataframe

lapply