compression | 易学教程

How gzip file gets stored in HDFS

阅读更多关于 How gzip file gets stored in HDFS

问题 HDFS storage support compression format to store compressed file. I know that gzip compression doesn't support splinting. Imagine now the file is a gzip-compressed file whose compressed size is 1 GB. Now my question is: How this file will get stored in HDFS (Block size is 64MB) From this link I came to know that The gzip format uses DEFLATE to store the compressed data, and DEFLATE stores data as a series of compressed blocks. But I couldn't understand it completely and looking for broad

How do I refresh the video codecs shown in the list presented by AviSaveOptions() to show a newly installed codec?

阅读更多关于 How do I refresh the video codecs shown in the list presented by AviSaveOptions() to show a newly installed codec?

问题 I have a Delphi 6 application with a Wizard that helps people select an appropriate video codec. It uses AviSaveOptions() to show the user a list of video codecs so they can select one. The choice is saved to disk for later re-use. At one point in the Wizard, the user is directed to download and install one out of several popular video codecs using their browser in the event they don't have a suitable one. However, after they install the new video codec and return to my app, when I call

How can one copy the internal state of zlib compressor object in Python

阅读更多关于 How can one copy the internal state of zlib compressor object in Python

问题 I have to compress a long list of strings. I have to compress them individually. Each string is less than 1000 chars long. However many of these strings have a common prefix. Therefore I was wondering if I could amortize the compression cost, by compressing the common prefix first and then storing the state of the compressor and feed it the suffix of the strings. If you have any suggestions about how to accomplish this in Python that would be great. Although I mention zlib in the title any

The mimetype file has an extra field of length n. The use of the extra field feature of the ZIP format is not permitted for the mimetype file

阅读更多关于 The mimetype file has an extra field of length n. The use of the extra field feature of the ZIP format is not permitted for the mimetype file

问题 I am using the C# library DotNetZip (Ionic.Zip and Ionic.Zlib) to generate an ebook from a directory. Directory looks like this: BookName | |___content/ | images/ | css/ | (html pages, .ops, .ncx) | |___META-INF/ | container.xml | |___mimetype The code to generate the archive looks like this: using (ZipFile zip = new ZipFile(pathTemp + ".epub")) { zip.RemoveSelectedEntries("*.*"); zip.AddFile(mimetype, "").CompressionLevel = CompressionLevel.None; zip.AddDirectory(pathTemp + "\\content",

Is there a “Fast Infoset” XML compression library for Delphi?

阅读更多关于 Is there a “Fast Infoset” XML compression library for Delphi?

问题 I would like to support Fast Infoset in some enterprise applications to reduce network traffic for XML and SOAP exchanges. As documented on Wikipedia, there are Fast Infoset implementations for C# and Java. According to OSS Fast Infoset Tools, implementations are already available on several platforms including Microsoft .NET and .NET CF, Sun GlassFish, BEA WebLogic. 回答1: It doesn't look like it. So you have two choices if you want to use this in a Delphi program. You could use the .NET

lempel-ziv compression algorithm implemention

阅读更多关于 lempel-ziv compression algorithm implemention

问题 I wrote the following code for implementing lempel-ziv compression algorithm for the following sample string: AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE Code: keys=[] text = open('test').read() # contain of the string: AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE index=0 t=time.time() def sub_strin_check(text,index_num): n = 1 while True: substring = text[index_num:index_num+n] if substring not in keys : print(substring) keys.append(substring) # print(keys[-1]) return (index_num+n) else: n = n+1

String, byte[] and compression

阅读更多关于 String, byte[] and compression

问题 We can disassemble String to and from byte[] easily String s = "my string"; byte[] b = s.getBytes(); System.out.println(new String(b)); // my string When compression is involved however there seem to be some issues. Suppose you have 2 methods, compress and uncompress (code below works fine) public static byte[] compress(String data) throws UnsupportedEncodingException, IOException { byte[] input = data.getBytes("UTF-8"); Deflater df = new Deflater(); df.setLevel(Deflater.BEST_COMPRESSION); df

Google Closure compiler not compressing string values?

阅读更多关于 Google Closure compiler not compressing string values?

问题 Having something like this: (function ($, window, document, undefined) { 'use strict'; $.fn.demo = function (options) { var active = "active"; var section = ".bb-demo"; $(section).addClass(active); $(section).addClass(active); $(section).addClass(active); $(section).addClass(active); }; })(jQuery, window, document); Closure Simple mode results in 200 bytes : (function(a,b,c,d){a.fn.demo=function(b){a(".bb-demo").addClass("active");a(".bb-demo").addClass("active");a(".bb-demo").addClass(

Unzip file while reading it

阅读更多关于 Unzip file while reading it

问题 I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with .NET to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file. So there would be no need for extra space on my drive. Any ideas or suggestions? 回答1: Yes. Zip is a streamed format

Algorithm for simple string compression

阅读更多关于 Algorithm for simple string compression

问题 I would like to find the shortest possible encoding for a string in the following form: abbcccc = a2b4c 回答1: [NOTE: this greedy algorithm does not guarantee shortest solution] By remembering all previous occurrences of a character it is straight forward to find the first occurrence of a repeating string (minimal end index including all repetitions = maximal remaining string after all repetitions) and replace it with a RLE (Python3 code): def singleRLE_v1(s): occ = dict() # for each character