Decompress zlib stream in Clojure

孤街醉人 提交于 2021-01-27 05:33:11

问题


I have a binary file with contents created by zlib.compress on Python, is there an easy way to open and decompress it in Clojure?

import zlib
import json

with open('data.json.zlib', 'wb') as f:
    f.write(zlib.compress(json.dumps(data).encode('utf-8')))

Basicallly it isn't a gzip file, it is just bytes representing deflated data.

I could only find these references but not quite what I'm looking for (I think first two are most relevant):

  • deflateclj_hatemogi_clojure/deflate.clj
  • funcool/buddy-core/deflate.clj
  • Compressing / Decompressing strings in clojure
  • Reading and Writing Compressed Files
  • clj-http

Must I really implement this multi-line wrapper to java.util.zip or is there a nice library out there? Actually I'm not even sure if these byte streams are compatible across libraries, or if I'm just trying to mix-and-match wrong libs.

Steps in Python:

>>> '{"hello": "world"}'.encode('utf-8')
b'{"hello": "world"}'
>>> zlib.compress(b'{"hello": "world"}')
b'x\x9c\xabV\xcaH\xcd\xc9\xc9W\xb2RP*\xcf/\xcaIQ\xaa\x05\x009\x99\x06\x17'
>>> [int(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23]
>>> import numpy
>>> [numpy.int8(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]
>>> zlib.decompress(bytes([120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23])).decode('utf-8')
'{"hello": "world"}'

Decode attempt in Clojure:

; https://github.com/funcool/buddy-core/blob/master/src/buddy/util/deflate.clj#L40 without try-catch
(ns so.core
  (:import java.io.ByteArrayInputStream
           java.io.ByteArrayOutputStream
           java.util.zip.Deflater
           java.util.zip.DeflaterOutputStream
           java.util.zip.InflaterInputStream
           java.util.zip.Inflater
           java.util.zip.ZipException)
  (:gen-class))

(defn uncompress
  "Given a compressed data as byte-array, uncompress it and return as an other byte array."
  ([^bytes input] (uncompress input nil))
  ([^bytes input {:keys [nowrap buffer-size]
                  :or {nowrap true buffer-size 2048}
                  :as opts}]
   (let [buf  (byte-array (int buffer-size))
         os   (ByteArrayOutputStream.)
         inf  (Inflater. ^Boolean nowrap)]
     (with-open [is  (ByteArrayInputStream. input)
                 iis (InflaterInputStream. is inf)]
       (loop []
         (let [readed (.read iis buf)]
           (when (pos? readed)
             (.write os buf 0 readed)
             (recur)))))
     (.toByteArray os))))

(uncompress (byte-array [120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]))
ZipException invalid stored block lengths  java.util.zip.InflaterInputStream.read (InflaterInputStream.java:164)

Any help would be appreciated. I wouldn't want to use zip or gzip files as I only care about raw content, not file names or modification dates in this context. But is possible to use an other compression algorithm on Python side if it is the only option.


回答1:


Here is an easy way to do it with gzip:

Python code:

import gzip
content = "the quick brown fox"
with gzip.open('fox.txt.gz', 'wb') as f:
    f.write(content)

Clojure code:

(with-open [in (java.util.zip.GZIPInputStream.
                (clojure.java.io/input-stream
                 "fox.txt.gz"))]
  (println "result:" (slurp in)))

;=>  result: the quick brown fox

Keep in mind that "gzip" is an algorithm and a format, and does not mean you need to use the "gzip" command-line tool.

Please note that the input to Clojure doesn't have to be a file. You could send the gzip compressed data as raw bytes over a socket and still decompress it on the Clojure side. Full details at: https://clojuredocs.org/clojure.java.io/input-stream

Update

If you need to use the pure zlib format instead of gzip, the result is very similar:

Python code:

import zlib
fp = open( 'balloon.txt.z', 'wb' )
fp.write( zlib.compress( 'the big red baloon' ))
fp.close()

Clojure code:

(with-open [in (java.util.zip.InflaterInputStream.
                (clojure.java.io/input-stream
                 "balloon.txt.z"))]
  (println "result:" (slurp in)))

;=> result: the big red baloon


来源:https://stackoverflow.com/questions/41959409/decompress-zlib-stream-in-clojure

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!