bzip2

Utilizing multi core for tar+gzip/bzip compression/decompression

廉价感情. 提交于 2019-11-26 23:45:34
问题 I normally compress using tar zcvf and decompress using tar zxvf (using gzip due to habit). I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I notice that many of the cores are unused during compression/decompression. Is there any way I can utilize the unused cores to make it faster? 回答1: You can use pigz instead of gzip, which does gzip compression on multiple cores. Instead of using the -z option, you would pipe it through pigz: tar cf - paths-to

Best splittable compression for Hadoop input = bz2?

家住魔仙堡 提交于 2019-11-26 20:12:10
问题 We've realized a bit too late that archiving our files in GZip format for Hadoop processing isn't such a great idea. GZip isn't splittable, and for reference, here are the problems which I won't repeat: Very basic question about Hadoop and compressed input files Hadoop gzip compressed files Hadoop gzip input file using only one mapper Why can't hadoop split up a large text file and then compress the splits using gzip? My question is: is BZip2 the best archival compression that will allow a

Extract bz2 file in R

…衆ロ難τιáo~ 提交于 2019-11-26 19:35:13
问题 I have bunch of .csv.bz2 files, which i have to download, extract, and read in R. I downloaded the file and want to extract it to current working directory, then read it. unz(filename,filename.csv) but it does not seem to work. How can I do that? I heard somewhere that bzfiles can be read directly without decompressing. How can I do that? 回答1: You can use any of these two commands: read.csv() command: with this command you can directly supply your compressed filename containing csv file. read

missing python bz2 module

大兔子大兔子 提交于 2019-11-26 15:49:58
问题 I have installed at my home directory. [spatel@~ dev1]$ /home/spatel/python-2.7.3/bin/python -V Python 2.7.3 I am trying to run one script which required python 2.7.x version, and i am getting missing bz2 error [spatel@~ dev1]$ ./import_logs.py Traceback (most recent call last): File "./import_logs.py", line 13, in <module> import bz2 ImportError: No module named bz2 I have tried to install bz2 module but i got lots of error [spatel@dev1 python-bz2-1.1]$ /home/spatel/python-2.7.3/bin/python