Lossless hierarchical run length encoding

前端 未结 2 1183
花落未央
花落未央 2020-12-30 08:39

I want to summarize rather than compress in a similar manner to run length encoding but in a nested sense.

For instance, I want : ABCBCABCBCDEEF to become: (2A(2BC

2条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-30 09:08

    Looking at the problem theoretically, it seems similar to the problem of finding the smallest context free grammar which generates (only) the string, except in this case the non-terminals can only be used in direct sequence after each other, so e.g.

    
    ABCBCABCBCDEEF
    s->ttDuuF
    t->Avv
    v->BC
    u->E
    
    ABABCDABABCD
    s->ABtt
    t->ABCD
    
    

    Of course, this depends on how you define "smallest", but if you count terminals on the right side of rules, it should be the same as the "length in original symbols" after doing the nested run-length encoding.

    The problem of the smallest grammar is known to be hard, and is a well-studied problem. I don't know how much the "direct sequence" part adds to or subtracts from the complexity.

提交回复
热议问题