发表新帖

发表新帖

How to determine a strings dna for likeness to another

后端未结

关注

 6  2069

爱一瞬间的悲伤 2020-12-28 20:44

I am hoping I am wording this correctly to get across what I am looking for.

I need to compare two pieces of text. If the two strings are alike I would like to get s

6条回答

不知归路 (楼主)

2020-12-28 21:03

Many people have suggested looking at distance/metric like approaches, and I think the wording of the question leads that way. (By the way, a hash like md5 is trying to do pretty much the opposite thing that a metric does, so it's hardly surprising that this wouldn't work for you. There are similar ideas that don't change much under small deltas, but I suspect they don't encode enough information for what you want to do)

Particularly given your update in the comments though, I think this type of approach is not very helpful.

What you are looking for is more of a clustering problem, where you want to generate a signature (i.e. feature vector) from each email and later compare it to new inputs. So essentially what you have is a machine learning problem. Deciding what "close" means may be a bit of a challenge. To get started though, assuming it actually is emails you're looking at you may do well to look at the sorts of feature generation done by many spam-filters, this will give you (probably Euclidean, at least to start) a space to measure distances in based on a signature (feature vector).

Without knowing more about your problem it's hard to be more specific.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题