How to measure complexity of a string?

两盒软妹~` 提交于 2019-12-10 02:33:17

问题


I have a few long strings (~ 1.000.000 chars). Each string only contains symbols from the defined alphabet, for example

A = {1,2,3}

Sample strings

string S1 = "1111111111 ..."; //[meta complexity] = 0
string S2 = "1111222333 ..."; //[meta complexity] = 10
string S3 = "1213323133 ..."; //[meta complexity] = 100

Q What kind of measures can I use to quantify the complexity of these strings? I can see that S1 is less complex than S3, but how can I do that programmatically from .NET? Any algorithm or point to the tool/literature would be greatly appreciated.

Edit

I tried Shannon entropy, but it turned out that it is not really useful for me. I will have the same H value for these sequences AAABBBCCC and ABCABCABC and ACCCBABAB and BBACCABAC


This is what I ended up doing

回答1:


Compressing the strings using standard techniques such as zip gives a good indication of the compexity.

Good compression rate ≈ lower complexity
Bad compression rate ≈ higher complexity



来源:https://stackoverflow.com/questions/6084402/how-to-measure-complexity-of-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!