Using RecordLinkage to add a column with a number for each person

前端 未结 2 871
眼角桃花
眼角桃花 2020-12-18 14:05

I\'d like to do what I think is a very simple operation -- adding a column with a number for each person to a dataset with a list of (potentially) duplicative names. I think

2条回答
  •  盖世英雄少女心
    2020-12-18 14:17

    small rewrite avoiding that the weights and classifier have to be tuned with the IDs,

    df_names <- data.frame(Name=c("Peter","Peter","Peter","Connor","Matt"))
    
    df_names %>% compare.dedup() %>%
                 epiWeights() %>%
                 epiClassify(0.3) %>%
                 getPairs(show = "links", single.rows = TRUE) -> matches
    
    left_join(mutate(df_names,ID = 1:nrow(df_names)), 
              select(matches,id1,id2) %>% arrange(id1) %>% filter(!duplicated(id2)), 
              by=c("ID"="id2")) %>%
        mutate(ID = ifelse(is.na(id1), ID, id1) ) %>%
        select(-id1)
    

提交回复
热议问题