why can't python execute a zip archive passed via stdin?

好久不见. 提交于 2019-12-07 18:45:55

问题


I have a zip archive containing a __main__.py file : archive.zip

I can execute it with

python archive.zip
=> OK !

but not with

cat archive.zip | python
=> File "<stdin>", line 1
SyntaxError: Non-ASCII character '\x9e' in file <stdin> on line 2,
but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

why is there a difference between the 2 modes and is there a way to make the pipe work without unzipping outside of python ?

I receive this archive over the network and want to execute it as soon as i receive it and as fast as possible so I thought that piping the zip into python would work !


回答1:


The reason that you can 'python file.zip', but not 'cat file.zip | python' is that Python has the 'zipimport' built in so that when you run python against files (or try to import them), zipimport takes a crack at them as part of the import process. (See the import module for details).

But with stdin, python does not make any attempt to search the streaming data - because the streaming data could be anything - could be user input that is handled by code, could be code. There's no way to know and Python makes no real effort to know for that reason.

edit

Occasionally, when you're answering questions - you think 'I really shouldn't tell someone the answer', not because you wish to be secretive or hold some amount of power over them. Simply because the path they're going down isn't the right path and you want to help them out of the hole they're digging. This is one of those situations. However, against my better judgement, here's an extremely hacky way of accomplishing something similar to what you want. It's not the best way, it's probably in fact the worst way to do it.

I just played around with the zipimporter for a while and tried all the tricks I could think of. I looked at 'imp', 'compile' as well.. Nothing can import a zipped module (or egg) from memory so far that I can see. So, an interim step is needed.

I'll say this up front, I'm embarrassed to even be posting this. Don't show this to people you work with or people that you respect because they laugh at this terrible solution.

Here's what I did:

mkdir foo
echo "print 'this is foo!'" >>foo/__init__.py
zip foo.zip -r foo
rm -rf foo                   # to ensure it doesn't get loaded from the filesystem
mv foo.zip somethingelse.zip # To ensure it doesn't get zipimported from the filesystem

And then, I ran this program using

cat somethingelse.zip | python script.py

#!/usr/bin/python 

import sys
import os
import zipfile
import StringIO
import zipimport
import time

sys.path.append('/tmp')

class SinEater(object):
    def __init__(self):
        tmp = str(int(time.time()*100)) + '.zip'
        f = open(tmp, 'w')
        f.write(sys.stdin.read(1024*64)) # 64kb limit
        f.close()
        try:
            z = zipimport.zipimporter(tmp)
            z.load_module('foo')

        except:
            pass

if __name__ == '__main__':
    print 'herp derp'
    s = SinEater()

Produces:

herp derp
this is new

A solution that would be about a million times better than this would be to have a filesystem notification (inotify, kevent, whatever windows uses) that watches a directory for new zip files. When a new zip file is dropped in that directory, you could automatically zipimport it. But, I cannot stress enough even that solution is terrible. I don't know much about Ansible (anything really), but I cannot imagine any engineer thinking that it would be a good solution for how to handle code updates or remote control.




回答2:


A .zip file consists of a series of files where each is a local header and the compressed data, followed by a central directory which has the local header information repeated, offsets to the local headers, and some other data to allow random access to the files.

The usual way to access a .zip file is to find the central directory at the end of the file and read that in, and then use that information to access the local entries. That requires seeking.

It is possible to write an unzip that reads a zip file from a pipe. (In fact I did that once.) However that is not the kind of code that Python is using to read zip files.




回答3:


Interesting. I had no idea this was possible. But I'll take your word for it.

If I were to guess why it doesn't work when streaming in from the STDIN, I would say it's because processing a ZIP archive often requires backwards seeking. A ZIP archive consists of a bunch of compressed files concatenated together (with enough header data to decompress independently), and then an index at the end. In my experience, decompressors tend to seek straight to the end to grab the index and then seek earlier in the file to fetch and decompress payload data (even though it's possible to iterate through the compressed files individually).

Since in this case, the data comes from STDIN, the decompressor can't seek backwards. The same would apply for a naïve network stream as well.




回答4:


It is possible. But requires some coding) Main idea is use memory-mapped temporary file and redirect it into STDIN.

run_zipped_project.py

#!/usr/bin/env python
# encoding: utf-8
import os
import subprocess
from tempfile import SpooledTemporaryFile as tempfile

if __name__ == '__main__':
    filename = "test.zip" # here your zipped project
    size = os.path.getsize(filename)
    with open(filename, "rb") as test:
        code = test.read()
    test.close()

    # NOW WE LOAD IT FROM DISK BUT YOU CAN USE ANY ANOTHER SOURCE

    print "loaded {file} with size {size}".format(file=filename, size=size)
    size += 1  # prevent buffer overrun and dumping to disk


    f = tempfile(max_size=size, bufsize=size)
    f.write(code)
    f.seek(0)

    process = subprocess.Popen(["python2", "loader.py"],
        stdin=f,
        stdout=subprocess.PIPE,
        bufsize=size
        )
    print process.communicate()[0]
    f.close()
    print "closed"

loader.py

#!/usr/bin/env python
# encoding: utf-8
from zipimport import zipimporter

if __name__ == '__main__':
    zip = zipimporter('/dev/stdin')
    zip.load_module('__main__')


来源:https://stackoverflow.com/questions/20276105/why-cant-python-execute-a-zip-archive-passed-via-stdin

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!