Search for multiple values in a column in R

。_饼干妹妹 提交于 2019-12-02 19:11:49

问题


I have a data frame with two columns:

df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue","the dog is black; the bird is yellow and blue", "the bird is blue"), stringsAsFactors = F)

I'd need the sum of the occurrences of all the "animals" on each row in the entire "sentences" column.

For example: "animals" first row c("cat; dog; bird") = sum_occurrences_sentences_column (cat = 1) + (dog = 2) + (bird = 3) = 6 .

The result will be a third column like this:

df <- cbind( sum_accurrences_sentences_column = c("6", "5", "3"), df)

I have tried the following codes but they do not work.

df[str_split(df$animals, ";") %in% df$sentences, ]

str_count(df$sentences, str_split(df$animals, ";"))

Any help would be appreciated :)


回答1:


Here's a base R solution:

First remove all the ; with gsub, then split the sentences column and unlist it into a vector:

split_sentence_column = unlist(strsplit(gsub(';','',df$sentences),' '))

Then set up a for loop and for each row get a vector of the animals, check which of the sentence column animals are in the animal list with %in%, then sum all the TRUE cases. We can then assign this to a new df column directly:

for(i in 1:nrow(df)){
  animals = unlist(strsplit(df$animals[i], '; '))
  df$sum_occurrences_sentences_column[i] = sum(split_sentence_column %in% animals)
}

> df
         animals                                                        sentences sum_occurrences_sentences_column
1 cat; dog; bird the cat is brown; the dog is barking; the bird is green and blue                                6
2      dog; bird                    the dog is black; the bird is yellow and blue                                5
3           bird                                                 the bird is blue                                3




回答2:


A map() way to manipulate each animal piece in the first column.

library(tidyverse)
string <- unlist(str_split(df$sentences, ";"))

df %>% rowwise %>%
  mutate(SUM = str_split(animals, "; ", simplify = T) %>%
    map( ~ str_count(string, .)) %>%
    unlist %>% sum)

#   animals        sentences                                           SUM
#   <chr>          <chr>                                               <int>
# 1 cat; dog; bird the cat is brown; the dog is barking; the bird...   6
# 2 dog; bird      the dog is black; the bird is yellow and blue       5
# 3 bird           the bird is blue                                    3


来源:https://stackoverflow.com/questions/54277205/search-for-multiple-values-in-a-column-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!