问题
I have a data-frame with 2 columns ID and Product as below :
ID Product
A Clothing, Clothing Food, Furniture, Furniture
B Food,Food,Food, Clothing
C Food, Clothing, Clothing
I need to have only unique products for each ID, for example :
ID Product
A Clothing, Food, Furniture
B Food, Clothing
C Food, Clothing
How do I do this using R
回答1:
If there are multiple delimiters in the dataset, one way would be to split the 'Product' column using all the delimiters, get the unique and then paste it together (toString) grouped by 'ID'. Here we use data.table methods.
library(data.table)
setDT(df1)[, list(Product= toString(unique(strsplit(Product,
',\\s*|\\s+')[[1]]))), by = ID]
# ID Product
#1: A Clothing, Food, Furniture
#2: B Food, Clothing
#3: C Food, Clothing
来源:https://stackoverflow.com/questions/35286596/how-to-remove-duplicate-comma-separated-character-values-from-each-cell-of-a-col