What's a portable way of converting Byte-Order of strings in C

佐手、 提交于 2019-12-09 22:35:23

问题


I am trying to write server that will communicate with any standard client that can make socket connections (e.g. telnet client)

It started out as an echo server, which of course did not need to worry about network byte ordering.

I am familiar with ntohs, ntohl, htons, htonl functions. These would be great by themselves if I were transfering either 16 or 32-bit ints, or if the characters in the string being sent were multiples of 2 or 4 bytes.

I'd like create a function that operates on strings such as:

str_ntoh(char* net_str, char* host_str, int len)
{
    uint32_t* netp, hostp;
    netp = (uint32_t*)&net_str;
    for(i=0; i < len/4; i++){
         hostp[i] = ntoh(netp[i]);
    }
}

Or something similar. The above thing assumes that the wordsize is 32-bits. We can't be sure that the wordsize on the sending machine is not 16-bits, or 64-bits right?

For client programs, such as telnet, they must be using hton* before they send and ntoh* after they receive data, correct?

EDIT: For the people that thing because 1-char is a byte that endian-ness doesn't matter:

int main(void)
{
    uint32_t a = 0x01020304;
    char* c = (char*)&a;
printf("%x %x %x %x\n", c[0], c[1], c[2], c[3]);

}

Run this snippet of code. The output for me is as follows:

$ ./a.out
  4 3 2 1

Those on powerPC chipsets should get '1 2 3 4' but those of us on intel chipset should see what I got above for the most part.


回答1:


Maybe I'm missing something here, but are you sending strings, that is, sequences of characters? Then you don't need to worry about byte order. That is only for the bit pattern in integers. The characters in a string are always in the "right" order.

EDIT:

Derrick, to address your code example, I've run the following (slightly expanded) version of your program on an Intel i7 (little-endian) and on an old Sun Sparc (big-endian)

#include <stdio.h>
#include <stdint.h> 

int main(void)
{
    uint32_t a = 0x01020304;
    char* c = (char*)&a;
    char d[] = { 1, 2, 3, 4 };
    printf("The integer: %x %x %x %x\n", c[0], c[1], c[2], c[3]);
    printf("The string:  %x %x %x %x\n", d[0], d[1], d[2], d[3]);
    return 0;
}

As you can see, I've added a real char array to your print-out of an integer.

The output from the little-endian Intel i7:

The integer: 4 3 2 1
The string:  1 2 3 4

And the output from the big-endian Sun:

The integer: 1 2 3 4
The string:  1 2 3 4

Your multi-byte integer is indeed stored in different byte order on the two machines, but the characters in the char array have the same order.




回答2:


With your function signature as posted you don't have to worry about byte order. It accepts a char*, that can only handle 8-bit characters. With one byte per character, you cannot have a byte order problem.

You'd only run into a byte order problem if you send Unicode, either in UTF16 or UTF32 encoding. And the endian-ness of the sending machine doesn't match the one of the receiving machine. The simple solution for that is to use UTF8 encoding. Which is what most text is sent as across networks. Being byte oriented, it doesn't have a byte order issue either. Or you could send a BOM.




回答3:


If you'd like to send them as an 8-bit encoding (the fact that you're using char implies this is what you want), there's no need to byte swap. However, for the unrelated issue of non-ASCII characters, so that the same character > 127 appears the same on both ends of the connection, I would suggest that you send the data in something like UTF-8, which can represent all unicode characters and can be safely treated as ASCII strings. The way to get UTF-8 text based on the default encoding varies by the platform and set of libraries you're using.

If you're sending 16-bit or 32-bit encoding... You can include one character with the byte order mark which the other end can use to determine the endianness of the character. Or, you can assume network byte order and use htons() or htonl() as you suggest. But if you'd like to use char, please see the previous paragraph. :-)




回答4:


It seems to me that the function prototype doesn't match its behavior. You're passing in a char *, but you're then casting it to uint32_t *. And, looking more closely, you're casting the address of the pointer, rather than the contents, so I'm concerned that you'll get unexpected results. Perhaps the following would work better:

arr_ntoh(uint32_t* netp, uint32_t* hostp, int len)
  {
  for(i=0; i < len; i++)
    hostp[i] = ntoh(netp[i]);
  }

I'm basing this on the assumption that what you've really got is an array of uint32_t and you want to run ntoh() on all of them.

I hope this is helpful.



来源:https://stackoverflow.com/questions/1934168/whats-a-portable-way-of-converting-byte-order-of-strings-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!