问题
I'm making a encryption program and I need to open file in binary mode to access non-ascii and non-printable characters, I need to check if character from a file is letter, number, symbol or unprintable character. That means I have to check 1 by 1 if bytes (when they are decoded to ascii) match any of these characters:
{^9,dzEV=Q4ciT+/s};fnq3BFh% #2!k7>YSU<GyD\I]|OC_e.W0M~ua-jR5lv1wA`@8t*xr'K"[P)&b:g$p(mX6Ho?JNZL
I think I could encode these characters above to binary and then compare them with bytes. I don't know how to do this.
P.S. Sorry for bad English and binary misunderstanding. (I hope you know what I mean by bytes, I mean characters in binary mode like this):
\x01\x00\x9a\x9c\x18\x00
回答1:
There are two major string types in Python: bytestrings (a sequence of bytes) that represent binary data and Unicode strings (a sequence of Unicode codepoints) that represent human-readable text. It is simple to convert one into another (☯):
unicode_text = bytestring.decode(character_encoding)
bytestring = unicode_text.encode(character_encoding)
If you open a file in binary mode e.g., 'rb' then file.read() returns a bytestring (bytes type):
>>> b'A' == b'\x41' == chr(0b1000001).encode()
True
There are several methods that can be used to classify bytes:
string methods such as
bytes.isdigit():>>> b'1'.isdigit() Truestring constants such as
string.printable>>> import string >>> b'!' in string.printable.encode() Trueregular expressions such as
\d>>> import re >>> bool(re.match(br'\d+$', b'123')) Trueclassification functions in
curses.asciimodule e.g.,curses.ascii.isprint()>>> from curses import ascii >>> bytearray(filter(ascii.isprint, b'123')) bytearray(b'123')
bytearray is a mutable sequence of bytes — unlike a bytestring you can change it inplace e.g., to lowercase every 3rd byte that is uppercase:
>>> import string
>>> a = bytearray(b'ABCDEF_')
>>> uppercase = string.ascii_uppercase.encode()
>>> a[::3] = [b | 0b0100000 if b in uppercase else b
... for b in a[::3]]
>>> a
bytearray(b'aBCdEF_')
Notice: b'ad' are lowercase but b'_' remained the same.
To modify a binary file inplace, you could use mmap module e.g., to lowercase 4th column in every other line in 'file':
#!/usr/bin/env python3
import mmap
import string
uppercase = string.ascii_uppercase.encode()
ncolumn = 3 # select 4th column
with open('file', 'r+b') as file, \
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_WRITE) as mm:
while True:
mm.readline() # ignore every other line
pos = mm.tell() # remember current position
if not mm.readline(): # EOF
break
if mm[pos + ncolumn] in uppercase:
mm[pos + ncolumn] |= 0b0100000 # lowercase
Note: Python 2 and 3 APIs differ in this case. The code uses Python 3.
Input
ABCDE1
FGHIJ
ABCDE
FGHI
Output
ABCDE1
FGHiJ
ABCDE
FGHi
Notice: 4th column became lowercase on 2nd and 4h lines.
Typically if you want to change a file: you read from the file, write modifications to a temporary file, and on success you move the temporary file inplace of the original file:
#!/usr/bin/env python3
import os
import string
from tempfile import NamedTemporaryFile
caesar_shift = 3
filename = 'file'
def caesar_bytes(plaintext, shift, alphabet=string.ascii_lowercase.encode()):
shifted_alphabet = alphabet[shift:] + alphabet[:shift]
return plaintext.translate(plaintext.maketrans(alphabet, shifted_alphabet))
dest_dir = os.path.dirname(filename)
chunksize = 1 << 15
with open(filename, 'rb') as file, \
NamedTemporaryFile('wb', dir=dest_dir, delete=False) as tmp_file:
while True: # encrypt
chunk = file.read(chunksize)
if not chunk: # EOF
break
tmp_file.write(caesar_bytes(chunk, caesar_shift))
os.replace(tmp_file.name, filename)
Input
abc
def
ABC
DEF
Output
def
ghi
ABC
DEF
To convert the output back, set caesar_shift = -3.
回答2:
To open a file in binary mode you use the open("filena.me", "rb") command. I've never used the command personally, but that should get you the information you need.
来源:https://stackoverflow.com/questions/28520922/how-to-change-the-bytes-in-a-file