Using RecordLinkage to add a column with a number for each person

前端 未结 2 862
眼角桃花
眼角桃花 2020-12-18 14:05

I\'d like to do what I think is a very simple operation -- adding a column with a number for each person to a dataset with a list of (potentially) duplicative names. I think

2条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-18 14:14

    I figured out the answer to my own question.

    df_names <- df_names %>% mutate(ID = 1:nrow(df_names))
    rpairs <- compare.dedup(df_names)
    p=epiWeights(rpairs)
    classify <- epiClassify(p,0.83)
    summary(classify)
    matches <- getPairs(classify, show = "links", single.rows = TRUE)
    

    this code writes an "ID" column that is the same for similar names

    matches <- matches %>% arrange(ID.1) %>% filter(!duplicated(ID.2))
    df_names$ID_prior <- df_names$ID
    

    merge matching information with the original data

    df_names <- left_join(df_names, matches %>% select(ID.1,ID.2), by=c("ID"="ID.2"))
    

    replace matches in ID with the thing they match with from ID.1

    df_names$ID <- ifelse(is.na(df_names$ID.1), df_names$ID, df_names$ID.1) 
    

提交回复
热议问题