huffman-code

Unique Identifiers Nodes in a Huffman Tree

删除回忆录丶 提交于 2019-12-11 05:57:02
问题 I'm building a Python program to compress/decompress a text file using a Huffman tree. Previously, I would store the frequency table a .json file alongside the compressed file. When I read in the compressed data and .json, I would rebuild the decompression tree from the frequency table. I thought this was a pretty eloquent solution. However, I was running into an odd issue with files of medium length where they would decompress into strings of seemingly random characters. I found that the

Steps to compress a file using Huffman Code

给你一囗甜甜゛ 提交于 2019-12-11 04:08:23
问题 I know there are many questions involving Huffman Code, including another one from myself, but I am wondering what would be the best way to actually encode a text file. Decompression seems trivial; traversing the tree, going left at 0 and right on 1, printing the character. Though, how does one go about compression? Somehow store the bit representation of the character in it's node the tree? Search the tree for the character each time it is encountered and trace the steps? Does it matter

CPU huffman compression faster after first execution?

牧云@^-^@ 提交于 2019-12-11 03:33:26
问题 I've recently constructed a CPU implementation of Huffman encoding in C++. I've also constructed a GPU version in CUDA in order to compare times, but I've come across a problem when testing the CPU's times: When stress testing by compressing large files, for instance a 97mb text file with almost every letter in the alphabet and various other ascii characters, my CPU implementation will take approximately 8.3 seconds the first time it executes. After that, the time drops significantly to 1.7

Lossless data compression in C without dynamic memory allocation

浪子不回头ぞ 提交于 2019-12-11 02:27:36
问题 I'm currently trying to implement a lossless data compression algorithm for a project I'm working on. The goal is to compress a fixed size list of floating point values. The code has to be written in C and can NOT use dynamic memory allocation. This hurts me greatly since most, if not all, lossless algorithms require some dynamic allocation. Two of the main algorithms I've been looking into are Huffman's and Arithmetic. Would this task be possible without dynamic memory allocation? Are there

Outputting bit data to binary file C++

老子叫甜甜 提交于 2019-12-10 14:14:17
问题 I am writing a compression program, and need to write bit data to a binary file using c++. If anyone could advise on the write statement, or a website with advice, I would be very grateful. Apologies if this is a simple or confusing question, I am struggling to find answers on web. 回答1: Collect the bits into whole bytes, such as an unsigned char or std::bitset (where the bitset size is a multiple of CHAR_BIT), then write whole bytes at a time. Computers "deal with bits", but the available

How to write Huffman coding to a file using Python?

家住魔仙堡 提交于 2019-12-10 11:06:41
问题 I created a Python script to compress text by using the Huffman algorithm. Say I have the following string: string = 'The quick brown fox jumps over the lazy dog' Running my algorithm returns the following 'bits': result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000' By comparing the amount of bits of the result with the input string, the

variations in huffman encoding codewords

只愿长相守 提交于 2019-12-10 00:28:29
问题 I'm trying to solve some huffman coding problems, but I always get different values for the codewords (values not lengths). for example, if the codeword of character 'c' was 100, in my solution it is 101. Here is an example: Character Frequency codeword my solution A 22 00 10 B 12 100 010 C 24 01 11 D 6 1010 0110 E 27 11 00 F 9 1011 0111 Both solutions have the same length for codewords, and there is no codeword that is prefix of another codeword. Does this make my solution valid ? or it has

How to create Huffman tree from FFC4 (DHT) header in jpeg file?

寵の児 提交于 2019-12-09 18:25:54
问题 I thought I could work this one out myself but I don't seem to be moving forward at all. Ok, the background: I need to create a Huffman tree of codes from the information provided by the FFC4, DHT (Define Huffman Table) header in a jpg file. The DHT header defines the Huffman table in this way: 1) A series of 16 bytes. Each byte defines how many symbols have a Huffman code of n amount of bits where n is the position of the byte in the series. (did that make any sense?!!) For example the raw

Writing files in bit form to a file in C

心不动则不痛 提交于 2019-12-09 10:07:04
问题 I am implementing the huffman algorithm in C. I have got the basic functionality down up to the point where the binary codewords are obtained. so for example, abcd will be 100011000 or something similar. now the question is how do you write this code in binary form in the compressed file. I mean if I write it normally each 1 and 0 will be one character so there is no compression. I need to write those 1s and 0s in their bit form. is that possible in C. if so how? 回答1: Collect bits until you

Optimal Huffman Code for Fibonacci numbers

﹥>﹥吖頭↗ 提交于 2019-12-08 06:38:32
问题 What is an optimal Huffman code for the following characters whose frequencies are the first 8 Fibonacci numbers: a : 1, b : 1, c : 2, d : 3, e : 5, f : 8, g : 13, h : 21? Generalize the case to find an optimal code when the frequencies are the first n Fibonacci numbers. This is one of the assignment problems I have. I'm not asking for a straight answer, just for some resources. Where should I look to put the pieces together to answer the questions? 回答1: Read - http://en.wikipedia.org/wiki