packet is sent completely somtimes and somtimes is not sent completely

问题

@Grismar recommended me to create new topic for the following problem:

I wrote a server and client with socket module.For multi connection I used selectors module instead of thread or fork().

Scenario: I have to generate a massive string and send to client.Of course according to a string is generated by client. Indeed client send a query and server generate a result and send to client. I don't have problem for send query to server.

Because I have massive string, I decided to split my string to chunks, such as :

if sys.getsizeof(search_result_string) > 1024: #131072:
    if sys.getsizeof(search_result_string) % 1024 == 0:
        chunks = int(sys.getsizeof(search_result_string) / 1024 )
    else:
        chunks = int(sys.getsizeof(search_result_string) / 1024) + 1
for chunk in range(chunks):
    packets.append(search_result_string[:1024])
    search_result_string = search_result_string[1024:]

So , I have packets list. Then:

conn.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1000000)
for chunk in packets:
    conn.sendall(bytes(chunk,'utf-8'))

Somtimes I don't have any problem in client, and somtimes I get the following error:

Traceback (most recent call last):
  File "./multiconn-client.py", line 116, in <module>
    service_connection(key, mask)
  File "./multiconn-client.py", line 89, in service_connection
    target_string += recv_data.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 42242: unexpected end of data

At my client I used the following callback:

def service_connection(key, mask):
    buff = 10000
    sock = key.fileobj
    data = key.data
    target_string = str()
    if mask & selectors.EVENT_READ:
        buff = sock.getsockopt(SOL_SOCKET,SO_RCVBUF)
        recv_data = sock.recv( 128*1024 |buff)
        if recv_data:
            buff = sock.getsockopt(SOL_SOCKET,SO_RCVBUF)
            data.recv_total += len(recv_data)
        target_string += recv_data.decode('utf-8')
        print(target_string)
        if not recv_data: #or data.recv_total == data.msg_total:
            print("closing connection", data.connid)
            sel.unregister(sock)
            sock.close()
    if mask & selectors.EVENT_WRITE:
        if not data.outb and data.messages:
            data.outb = data.messages.pop(0)
        if data.outb:
            print("sending", repr(data.outb), "to connection", data.connid)
            sent = sock.send(data.outb)  # Should be ready to write
            data.outb = data.outb[sent:]

By the way, I use TCP socket.And test in localhost both.
I use same string for every run.

Questions is, Why somtimes everything is okey and sometimes string is not sent completely.

回答1:

What's happening is that your data is being chunked by the operating system (in addition to what you're doing). And when the operating system does it, it may split your data in the middle of a UTF-8 encoding sequence. In other words, consider this block of code:

foo = '\xce\xdd\xff'       # three non-ascii characters
print(len(foo))            # => 3
bar = foo.encode('utf-8')
print(bar)                 # => b'\xc3\x8e\xc3\x9d\xc3\xbf'
bar[:3].decode()           # =>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 2: unexpected end of data

What's going on: Those characters above 0x7f get encoded as two UTF8 bytes. But you cannot decode a character if the two-byte sequence gets truncated in the middle.

So, to easily fix your problem, receive all the data first (as a byte string), then decode the entire byte string as a unit.

This brings up another related issue: you needn't create your own data chunks. TCP will do that for you. And as you've seen, TCP won't preserve your message boundaries anyway. So your best option is to properly "frame" your data.

That is, take some part of your string (or all of your string if it isn't hundreds of megabytes), and encode it in UTF-8. Take the length of the resulting byte buffer. Send, as binary data, a fixed-length size field (use the struct module to create that) containing that length. On the receiving side, first receive the fixed-length size field. This lets you know how many bytes of string data you actually need to receive. Receive all of those bytes, then decode the entire buffer at once.

In other words, ignoring error handling, sending side:

import struct
import socket
...
str_to_send = "blah blah\xce"
bytes_to_send = str_to_send.encode('utf-8')
len_bytes = len(bytes_to_send)
sock.send(struct.pack("!I", len_bytes)         # Send 4-byte size header
sock.send(bytes_to_send)                       # Let TCP handle chunking bytes

Receiving side:

len_bytes = sock.recv(4)                       # Receive 4-byte size header
len_bytes = struct.unpack("!I")[0]             # Convert to number (unpack returns a list)

bytes_sent = b''
while len(bytes_sent) < len_bytes:
    buf = sock.recv(1024)          # Note, may not always receive 1024 (but typically will)
    if not buf:
        print("Unexpected EOF!")
        sys.exit(1)
    bytes_sent += buf
str_sent = bytes_sent.decode('utf-8')

Final word: socket.send does not guarantee to send the entire buffer (although it typically does). And socket.recv does not guarantee to receive as many bytes as you specified in the argument. So, robust TCP sending/receiving code needs to accommodate those caveats.

来源：https://stackoverflow.com/questions/57957363/packet-is-sent-completely-somtimes-and-somtimes-is-not-sent-completely

标签

python

sockets

network-programming