For my work, I scrape web-sites and write them to gzipped web-archives (with extension "warc.gz"). I use Python 2.7.11 and the warc 0.2.1 library. I noticed that for majority of files I cannot read them completely with the warc-library. For example if the warc.gz file has 517 records, I can read only about 200 of them. After some research I found out that this problem happens only with the gzipped files. The files with extension "warc" do not have this problem. I have found out that some people have this problem as well ( https://github.com/internetarchive/warc/issues/21 ), while no solution