问题
I used to fiddle with R and now it all seems to have escaped me . . .
I have a table with a few hundred columns and about 100k rows. One of those columns contains strings that sometimes have commas in them (e.g. chicken,goat,cow or just chicken). I need a script with a (I believe) for loop that can create a new column (I know the new column code should not be in the for loop), count the number of commas (or the number of entries in the column in question less one) and add one so I can find out how many entries are in each column. An example:
col
chicken
chicken,goat
cow,chicken,goat
cow
I want a script to turn create an additional column in the table that would look like . . .
col2
1
2
3
1
回答1:
A loop is not needed here, I think. Using the stringr package...
require(stringr)
dat$aninum <- sapply(dat$ani,str_count,pattern=',')+1
which gives
ani aninum
1 chicken 1
2 chicken,goat 2
3 cow,chicken,goat 3
4 cow 1
回答2:
I would use count.fields (from base R):
mydf$col2 <- count.fields(file = textConnection(as.character(mydf$col)),
sep = ",")
mydf
# col col2
# 1 chicken 1
# 2 chicken,goat 2
# 3 cow,chicken,goat 3
# 4 cow 1
Update: Accounting for blank lines
count.fields has a logical argument blank.lines.skip. So, to capture information for empty lines, just set that to TRUE.
Example:
mydf <- data.frame(col = c("chicken", "", "chicken,goat", "cow,chicken,goat", "cow"))
count.fields(file = textConnection(as.character(mydf$col)),
sep = ",", blank.lines.skip=FALSE)
# [1] 1 0 2 3 1
回答3:
You could use ?strsplit:
df <- data.frame(col=c("chicken", "chicken,goat", "cow,chicken,goat", "cow"), stringsAsFactors=FALSE)
df$col2 <- sapply(strsplit(df$col, ","), length)
df
# col col2
# 1 chicken 1
# 2 chicken,goat 2
# 3 cow,chicken,goat 3
# 4 cow 1
来源:https://stackoverflow.com/questions/18881136/r-for-loop-create-a-new-column-with-the-count-of-a-sub-str-from-a-different-col