optimizing byte-pair encoding

后端 未结 9 1141
广开言路
广开言路 2020-12-30 10:45

Noticing that byte-pair encoding (BPE) is sorely lacking from the large text compression benchmark, I very quickly made a trivial literal implementation of

9条回答
  •  抹茶落季
    2020-12-30 11:26

    the easiest efficient structure is a 2 dimensional array like byte_pair(255,255). Drop the counts in there and modify as the file compresses.

提交回复
热议问题