How to detect duplicate data?

后端 未结 11 1687
半阙折子戏
半阙折子戏 2021-02-01 08:24

I have got a simple contacts database but I\'m having problems with users entering in duplicate data. I have implemented a simple data comparison but unfortunately the duplicate

11条回答
  •  暗喜
    暗喜 (楼主)
    2021-02-01 08:46

    I imagine that this problem is well understood but what occurs to me on first reading is:

    • compare fields individually
    • count those that match (for a possibly loose definition of match, and possibly weighing the fields differently)
    • present for human intervention any cases which pass some threshold

    Use your existing database to get a good first guess for the threshold, and correct as you accumulate experience.

    You may prefer a fairly strong bias toward false positives, at least at first.

提交回复
热议问题