Fastest bitwise xor between two multibyte binary data variables

后端 未结 7 1815
耶瑟儿~
耶瑟儿~ 2021-01-02 01:05

What is the fastest way to implementat the following logic:

def xor(data, key):
    l = len(key)

    buff = \"\"
    for i in range(0, len(data)):
        b         


        
相关标签:
7条回答
  • 2021-01-02 01:18

    If len(data) is large, you might see a significant improvement from xrange. Actually, you can replace the range function entirely with enumerate. You might also benefit from using a list instead of appending to a string.

    def xor(data, key):
        l = len(key)
        buff = []
        for idx, val in enumerate(data):
            buff.append(chr(ord(val) ^ ord(key[idx % l]))
        return ''.join(buff)
    

    I haven't timed it, but off the top of my head I'd expect that to be a bit faster for large amounts of data. Make sure you measure every change.

    If profiling suggests that the call to ord() actually takes time, you can run it on all the values in key ahead of time to save a call in the loop.

    You could also turn that for loop into a plain old list comprehension, but it will negatively impact readability. Regardless, try it and see if it's way faster.

    0 讨论(0)
  • 2021-01-02 01:23

    This code should work in Python 2.6+ including Py3k.

    from binascii import hexlify as _hexlify
    from binascii import unhexlify as _unhexlify
    
    
    def packl(lnum, padmultiple=0):
        """Packs the lnum (which must be convertable to a long) into a
        byte string 0 padded to a multiple of padmultiple bytes in size. 0
        means no padding whatsoever, so that packing 0 result in an empty
        string.  The resulting byte string is the big-endian two's
        complement representation of the passed in long."""
    
        if lnum == 0:
            return b'\0' * padmultiple
        elif lnum < 0:
            raise ValueError("Can only convert non-negative numbers.")
        s = hex(lnum)[2:]
        s = s.rstrip('L')
        if len(s) & 1:
            s = '0' + s
        s = _unhexlify(s)
        if (padmultiple != 1) and (padmultiple != 0):
            filled_so_far = len(s) % padmultiple
            if filled_so_far != 0:
                s = b'\0' * (padmultiple - filled_so_far) + s
        return s
    
    def unpackl(bytestr):
        """Treats a byte string as a sequence of base 256 digits
        representing an unsigned integer in big-endian format and converts
        that representation into a Python integer."""
    
        return int(_hexlify(bytestr), 16) if len(bytestr) > 0 else 0
    
    def xor(data, key):
        dlen = len(data)
        klen = len(key)
        if dlen > klen:
            key = key * ((dlen + klen - 1) // klen)
        key = key[:dlen]
        result = packl(unpackl(data) ^ unpackl(key))
        if len(result) < dlen:
             result = b'\0' * (dlen - len(result)) + result
        return result
    

    This will also work in Python 2.7 and 3.x. It has the advantage of being a lot simpler than the previous one while doing basically the same thing in approximately the same amount of time:

    from binascii import hexlify as _hexlify
    from binascii import unhexlify as _unhexlify
    
    def xor(data, key):
        dlen = len(data)
        klen = len(key)
        if dlen > klen:
            key = key * ((dlen + klen - 1) // klen)
        key = key[:dlen]
        data = int(_hexlify(data), 16)
        key = int(_hexlify(key), 16)
        result = (data ^ key) | (1 << (dlen * 8 + 7))
        # Python 2.6/2.7 only lines (comment out in Python 3.x)
        result = memoryview(hex(result))
        result = (result[4:-1] if result[-1] == 'L' else result[4:])
        # Python 3.x line
        #result = memoryview(hex(result).encode('ascii'))[4:]
        result = _unhexlify(result)
        return result
    
    0 讨论(0)
  • 2021-01-02 01:25

    Following on my comment in the initial post, you can process large files rather quickly if you stick to numpy for key padding and bitwise XOR'ing, like so:

    import numpy as np
    
    # ...
    
    def xor(key, data):
    
        data = np.fromstring(data, dtype=np.byte)
        key = np.fromstring(key, dtype=np.byte)
    
        # Pad the key to match the data length
        key = np.pad(key, (0, len(data) - len(key)), 'wrap')
    
        return np.bitwise_xor(key, data)
    
    
    0 讨论(0)
  • 2021-01-02 01:27

    What you have is already as fast as you can get in Python.

    If you really need it faster, implement it in C.

    0 讨论(0)
  • 2021-01-02 01:32

    Not tested

    Don't know if it's faster

    supposing that len(mystring) is a multiple of 4

    def xor(hash,mystring):
        s = struct.Struct("<L")
    
        v1 = memoryview(hash)
    
        tab1 = []
        for i in range(5):
            tab1.append(s.unpack_from(v1,i*4)
    
        v2 = memoryview(mystring)
        tab2=[]
        for i in range(len(mystring)/4):
            tab2.append(s.unpack_from(v1,i*4))
        tab3 = []
        try:
            for i in range(len(mystring)/20):
                for j in range(5):
                   tab3.append(s.pack(tab1[j]^tab2[5*i+j]))
        expect IndexError:
            pass
        return "".join(tab3)
    
    0 讨论(0)
  • 2021-01-02 01:33

    Disclaimer:As other posters have said, this is a really bad way to encrypt files. This article demonstrates how to reverse this kind of obfuscation trivially.

    first, a simple xor algorithm:

    def xor(a,b,_xor8k=lambda a,b:struct.pack("!1000Q",*map(operator.xor,
                        struct.unpack("!1000Q",a),
                        struct.unpack("!1000Q",b)))
            ):
        if len(a)<=8000:
            s="!%iQ%iB"%divmod(len(a),8)
            return struct.pack(s,*map(operator.xor,
                struct.unpack(s,a),
                struct.unpack(s,b)))
        a=bytearray(a)
        for i in range(8000,len(a),8000):
            a[i-8000:i]=_xor8k(
                a[i-8000:i],
                b[i-8000:i])
        a[i:]=xor(a[i:],b[i:])
        return str(a)
    

    secondly the wrapping xor algorithm:

    def xor_wrap(data,key,_struct8k=struct.Struct("!1000Q")):
        l=len(key)
        if len(data)>=8000:
            keyrpt=key*((7999+2*l)//l)#this buffer is accessed with whatever offset is required for a given 8k block
            #this expression should create at most 1 more copy of the key than is needed
            data=bytearray(data)
            offset=-8000#initial offset, set to zero on first loop iteration
            modulo=0#offset used to access the repeated key
            for offset in range(0,len(data)-7999,8000):
                _struct8k.pack_into(data,offset,*map(operator.xor,
                    _struct8k.unpack_from(data,offset),
                    _struct8k.unpack_from(keyrpt,modulo)))
                modulo+=8000;modulo%=l
            offset+=8000
        else:offset=0;keyrpt=key*(len(data)//l+1)#simple calculation guaranteed to be enough
        rest=len(data)-offset
        srest=struct.Struct("!%iQ%iB"%divmod(len(data)-offset,8))
        srest.pack_into(data,offset,*map(operator.xor,
            srest.unpack_from(data,offset),
            srest.unpack_from(keyrpt,modulo)))
        return data
    
    0 讨论(0)
提交回复
热议问题