optimizing byte-pair encoding
问题 Noticing that byte-pair encoding (BPE) is sorely lacking from the large text compression benchmark, I very quickly made a trivial literal implementation of it. The compression ratio - considering that there is no further processing, e.g. no Huffman or arithmetic encoding - is surprisingly good. The runtime of my trivial implementation was less than stellar, however. How can this be optimized? Is it possible to do it in a single pass? 回答1: This is a summary of my progress so far: Googling