Erlang server, Java client - TCP messages get split?

前端 未结 3 1829
广开言路
广开言路 2020-12-07 04:50

As the title says, I have a server written in Erlang, a client written in Java and they are communicating through TCP. The problem that I am facing is the fact that gen_tcp:

相关标签:
3条回答
  • 2020-12-07 05:03

    This makes me wonder if it is something that can be fixed on the Java side?

    No, absolutely not. Regardless of why you don't happen to see the problem with an Erlang client, if you aren't putting any sort of "message boundary" indication into the protocol, you will not be able to reliably detect whole messages. I strongly suspect that if you send a very large message with the Erlang client, you'll still see split messages.

    You should either:

    • Use some sort of "end of message" sequence, e.g. a 0 byte if that wouldn't otherwise come up in your messages.
    • Prefix each message with the length of the message.

    Aside from that, you aren't clearly differentiating between bytes and text at the moment. Your Java client is currently silently ignoring the top 8 bits of each char, for example. Rather than using DataOutputStream, I would suggest just using OutputStream, and then for each message:

    • Encode it as a byte array using a specific encoding, e.g.

      byte[] encodedText = text.getBytes(StandardCharsets.UTF_8);
      
    • Write a length prefix to the stream (possibly in a 7-bit-encoded integer, or maybe just as a fixed width, e.g. 4 bytes). (Actually, sticking with DataOutputStream would make this bit simpler.)

    • Write the data

    On the server side, you should "read a message" by reading the length, then reading the specified number of bytes.

    You can't get around the fact that TCP is a stream-based protocol. If you want a message-based protocol, you really do have to put that on top yourself. (I'm sure there are helpful libraries to do this, of course - but you shouldn't just leave it up to TCP and hope.)

    0 讨论(0)
  • 2020-12-07 05:05

    You need to define a protocol between your server and your client to split the TCP stream into messages. TCP stream is divided in packets, but there is no guarantee that these match your calls to send/write or recv/read.

    A simple and robust solution is to prefix all messages with a length. Erlang can do this transparently with {packet, 1|2|4} option, where the prefix is encoded on 1, 2 or 4 bytes. You will have to perform the encoding on the Java side. If you opt for 2 or 4 bytes, please be aware that the length should be encoded in big-endian format, the same byte-order used by DataOutputStream.outputShort(int) and DataOutputStream.outputInt(int) java methods.

    However, it seems from your implementations that you do have an implicit protocol: you want the server to process each line separately.

    This is fortunately also handled transparently by Erlang. You simply need to pass {packet, line} option. You might need to adjust the receive buffer, however, as lines longer that this buffer will be truncated. This can be done with {recbuf, N} option.

    So just redefining your options should do what you want.

    -define(MAX_LINE_SIZE, 512).
    -define(TCP_OPTIONS, [list, {packet, line}, {active, false}, {reuseaddr, true}, {recbuf, ?MAX_LINE_SIZE}].
    
    0 讨论(0)
  • 2020-12-07 05:24

    As Jon said, TCP is a streaming protocol and has no concept of a message in the sense that you are looking for. It is often broken up based on your rate of reading, kernerl buffer size, MTU of network, etc... There are no guarantees that you don't get your data 1 byte at a time.

    The easiest change to make to your app to get what you want is to change the erlang server side's TCP_OPTIONS {packet,0} to {packet,4}

    and change the java writer code to:

    while(true) {
       byte[] data = sc.nextLine().getBytes(StandardCharsets.UTF_8); // or leave out the UTF_8 for default platform encoding
       output.writeInt(data.length);
       output.write(data,0,data.length);
    }
    

    you should find that you receive exactly the right message.

    You also should add {packet,4} to the erlang client if you make this change on the server side as the server now expects a 4 byte header indicating the size of the message.

    note: the {packet,N} syntax is transparent in erlang code, the client doesn't need to send the int, and the server doesn't see the int. Java doesn't have the equivalent of size framing in the standard library, so you have to write the int size yourself.

    0 讨论(0)
提交回复
热议问题