I am working with some US govt data which has a lengthy list of cities and zip codes. After some work, the data is in the following format.
dat1 = data.frame
It helps to store the data as characters, not factors:
dat2 <- data.frame(keyword=c("Bremen", "Brent", "50143", "Chelsea, AL",
"Bailytown, Alabama", "52348", "54023", "54024"),
tag=c(rep("AlabamCity",2), rep("AlabamaCityST",2),
rep("AlabamaCityState",2), rep("AlabamaZipCode",2)),
stringsAsFactors = FALSE) ## note this bit
Now we can convert keyword to numeric, and if it isn't a number in character format, we get an NA:
want <- with(dat2, as.numeric(keyword))
which gives us this:
> (want <- with(dat2, as.numeric(keyword)))
[1] NA NA 50143 NA NA 52348 54023 54024
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
You can ignore the warning or suppress it, but don't use this casually as it can mask problems:
suppressWarnings(want <- with(dat2, as.numeric(keyword)))
The final step is to select the elements of want that are not NA and have keyword equal to "AlabamaZipCode", which we do using &:
(!is.na(want) & (dat2$tag != "AlabamaZipCode"))
That selects the rows we don't want, so we need to negate the above, turning TRUE to FALSE and vice versa:
!(!is.na(want) & (dat2$tag != "AlabamaZipCode"))
Putting this together we have:
dat2[!(!is.na(want) & (dat2$tag != "AlabamaZipCode")), ]
which gives:
> dat2[!(!is.na(want) & (dat2$tag != "AlabamaZipCode")), ]
keyword tag
1 Bremen AlabamCity
2 Brent AlabamCity
4 Chelsea, AL AlabamaCityST
5 Bailytown, Alabama AlabamaCityState
7 54023 AlabamaZipCode
8 54024 AlabamaZipCode
Full solution is:
want <- with(dat2, as.numeric(keyword))
dat2[!(!is.na(want) & (dat2$tag != "AlabamaZipCode")), ]