R for loop: create a new column with the count of a sub str from a different column

假装没事ソ 提交于 2020-01-05 10:25:11

问题


I used to fiddle with R and now it all seems to have escaped me . . .

I have a table with a few hundred columns and about 100k rows. One of those columns contains strings that sometimes have commas in them (e.g. chicken,goat,cow or just chicken). I need a script with a (I believe) for loop that can create a new column (I know the new column code should not be in the for loop), count the number of commas (or the number of entries in the column in question less one) and add one so I can find out how many entries are in each column. An example:

col
chicken
chicken,goat
cow,chicken,goat
cow

I want a script to turn create an additional column in the table that would look like . . .

col2
1
2
3
1

回答1:


A loop is not needed here, I think. Using the stringr package...

require(stringr)
dat$aninum <- sapply(dat$ani,str_count,pattern=',')+1

which gives

               ani aninum
1          chicken      1
2     chicken,goat      2
3 cow,chicken,goat      3
4              cow      1



回答2:


I would use count.fields (from base R):

mydf$col2 <- count.fields(file = textConnection(as.character(mydf$col)), 
                          sep = ",")
mydf
#                col col2
# 1          chicken    1
# 2     chicken,goat    2
# 3 cow,chicken,goat    3
# 4              cow    1

Update: Accounting for blank lines

count.fields has a logical argument blank.lines.skip. So, to capture information for empty lines, just set that to TRUE.

Example:

mydf <- data.frame(col = c("chicken", "", "chicken,goat", "cow,chicken,goat", "cow"))

count.fields(file = textConnection(as.character(mydf$col)), 
             sep = ",", blank.lines.skip=FALSE)
# [1] 1 0 2 3 1



回答3:


You could use ?strsplit:

df <- data.frame(col=c("chicken", "chicken,goat", "cow,chicken,goat", "cow"), stringsAsFactors=FALSE)
df$col2 <- sapply(strsplit(df$col, ","), length)
df
#                col col2
# 1          chicken    1
# 2     chicken,goat    2
# 3 cow,chicken,goat    3
# 4              cow    1


来源:https://stackoverflow.com/questions/18881136/r-for-loop-create-a-new-column-with-the-count-of-a-sub-str-from-a-different-col

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!