Deleting specific rows from a data frame

后端 未结 3 1998
北荒
北荒 2020-12-03 12:49

I am working with some US govt data which has a lengthy list of cities and zip codes. After some work, the data is in the following format.

dat1 = data.frame         


        
3条回答
  •  没有蜡笔的小新
    2020-12-03 13:16

    It helps to store the data as characters, not factors:

    dat2 <- data.frame(keyword=c("Bremen", "Brent", "50143", "Chelsea, AL", 
                                 "Bailytown, Alabama", "52348", "54023", "54024"),   
                       tag=c(rep("AlabamCity",2), rep("AlabamaCityST",2), 
                             rep("AlabamaCityState",2), rep("AlabamaZipCode",2)),
                       stringsAsFactors = FALSE) ## note this bit
    

    Now we can convert keyword to numeric, and if it isn't a number in character format, we get an NA:

    want <- with(dat2, as.numeric(keyword))
    

    which gives us this:

    > (want <- with(dat2, as.numeric(keyword)))
    [1]    NA    NA 50143    NA    NA 52348 54023 54024
    Warning message:
    In eval(expr, envir, enclos) : NAs introduced by coercion
    

    You can ignore the warning or suppress it, but don't use this casually as it can mask problems:

    suppressWarnings(want <- with(dat2, as.numeric(keyword)))
    

    The final step is to select the elements of want that are not NA and have keyword equal to "AlabamaZipCode", which we do using &:

    (!is.na(want) & (dat2$tag != "AlabamaZipCode"))
    

    That selects the rows we don't want, so we need to negate the above, turning TRUE to FALSE and vice versa:

    !(!is.na(want) & (dat2$tag != "AlabamaZipCode"))
    

    Putting this together we have:

    dat2[!(!is.na(want) & (dat2$tag != "AlabamaZipCode")), ]
    

    which gives:

    > dat2[!(!is.na(want) & (dat2$tag != "AlabamaZipCode")), ]
                 keyword              tag
    1             Bremen       AlabamCity
    2              Brent       AlabamCity
    4        Chelsea, AL    AlabamaCityST
    5 Bailytown, Alabama AlabamaCityState
    7              54023   AlabamaZipCode
    8              54024   AlabamaZipCode
    

    Full solution is:

    want <- with(dat2, as.numeric(keyword))
    dat2[!(!is.na(want) & (dat2$tag != "AlabamaZipCode")), ]
    

提交回复
热议问题