optimizing byte-pair encoding
Noticing that byte-pair encoding (BPE) is sorely lacking from the large text compression benchmark, I very quickly made a trivial literal implementation of it. The compression ratio - considering that there is no further processing, e.g. no Huffman or arithmetic encoding - is surprisingly good. The runtime of my trivial implementation was less than stellar, however. How can this be optimized? Is it possible to do it in a single pass? This is a summary of my progress so far: Googling found this little report that links to the original code and cites the source: Philip Gage, titled 'A New