How to subset a list based on the length of its elements in R

蓝咒 提交于 2019-12-19 04:41:17

问题


In R I have a function (coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply.

I have a list of IP's called ip.addresses:

> head(ip.addresses)
[1] "128.177.90.11"  "71.179.12.143"  "66.31.55.111"   "98.204.243.187" "67.231.207.9"   "67.61.248.12"  

Note: Those or any other IP's can be used to reproduce this problem.

So I apply the function to that object with sapply:

ips.info     <- sapply(ip.addresses, ip2coordinates)

and get a list called ips.info as my result. This is all good and fine, but I can't do much more with a list, so I need to convert it to a dataframe. The problem is that not all IP addresses are in the databases thus some list elements only have 1 field and I get this error:

> ips.df       <- as.data.frame(ips.info)
Error in data.frame(`128.177.90.10` = list(ip.address = "128.177.90.10",  : 

arguments imply differing number of rows: 1, 0

My question is -- "How do I remove the elements with missing/incomplete data or otherwise convert this list into a data frame with 11 columns and 1 row per IP address?"

I have tried several things.

  • First, I tried to write a loop that removes elements with less than a length of 11

    for (i in 1:length(ips.info)){
    if (length(ips.info[i]) < 11){
    ips.info[i] <- NULL}}
    

This leaves some records with no data and makes others say "NULL", but even those with "NULL" are not detected by is.null

  • Next, I tried the same thing with double square brackets and get

    Error in ips.info[[i]] : subscript out of bounds
    
  • I also tried complete.cases() to see if it could potentially be useful

    Error in complete.cases(ips.info) : not all arguments have the same length
    
  • Finally, I tried a variation of my for loop which was conditioned on length(ips.info[[i]] == 11 and wrote complete records to another object, but somehow it results in an exact copy of ips.info


回答1:


Here's one way you can accomplish this using the built-in Filter function

#input data
library(RDSTK)
ip.addresses<-c("128.177.90.10","71.179.13.143","66.31.55.111","98.204.243.188",
    "67.231.207.8","67.61.248.15")
ips.info  <- sapply(ip.addresses, ip2coordinates)

#data.frame creation
lengthIs <- function(n) function(x) length(x)==n
do.call(rbind, Filter(lengthIs(11), ips.info))

or if you prefer not to use a helper function

do.call(rbind, Filter(function(x) length(x)==11, ips.info))



回答2:


Alternative solution based on base package.

  # find non-complete elements
  ids.to.remove <- sapply(ips.info, function(i) length(i) < 11)
  # remove found elements
  ips.info <- ips.info[!ids.to.remove]
  # create data.frame
  df <- do.call(rbind, ips.info)


来源:https://stackoverflow.com/questions/25022511/how-to-subset-a-list-based-on-the-length-of-its-elements-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!