Are big endian and little endian values portable?

问题

Hello i have a small dout in little endian and big endian i know this question has asked n no of times but i could not figure out some below points

lets take int i=10 it is store in binary as 00000000 00000000 00000000 00001010 in stack section as below:-

00000000 |00000000 |00000000 |00001010   // In case of little endian
MSB-------------------------------------------LSB

Big endian

00001010 |00000000 |00000000 |00000000   // In case of in big endian
MSB-------------------------------------------LSB

In this both little and big endian will give same output 10 ?

Then what is the use of these both little and big endian?

I was asked to implement code which will be portable for all system that is big or small in my interview. I replied saying:

compiler will do it self like if int i=10 in little endian then in big endian too it is 10 as output

Is that answer correct?

回答1:

00000000 | 00000000 | 00000000 | 00001010 // big    endian

00001010 | 00000000 | 00000000 | 00000000 // little endian

Whether data is stored in big endian or little endian mode mostly only matters if you're trying to access a smaller portion of a variable in memory, usually via a pointer, like trying to access the least significant character of a 32 bit integer via a pointer to character or a union with a character array. Another example of an issue is if you read data from a file directly into an array of 32 bit integers or if you write data from an array of 32 bit integers. The data in the file will usually be also stored in little endian or big endian mode.

As far as I'm aware, there's no generic compile time method to determine if the cpu is running in big endian mode or little endian mode (specific compilers may have defines for this). You could write test code using a union of 32 bit integer and a character array of size 4. Then set the integer in the union to 10, and check to see if the union character array[0] contains the 10 which means little endian mode, or if the union character array[3] contains the 10, which means big endian mode. Other methods to determine if the CPU is in little endian or big endian mode are possible.

Once you determine if the cpu is in little endian or big endian mode, you can include conditional code to handle both cases, such as the file I/O to / from an array of 32 bit integers. If you wanted the file data to be in big endian mode, but your cpu is in little endian mode, you'd have to reverse the bytes of each integer before writing or after reading from a file.

You could also write code sequences to store data in big endian mode, regardless of the cpu mode. It would waste time if already in big endian mode, but it works for both big and little endian mode:

char     buffer[256];
char *   ptr2char;
uint32_t uint32bit;
/* ... */
    ptr2char = buffer;    /* store uint32bit in big endian mode */
    *ptr2char++ = (uint32bit >> 24)&0xff;
    *ptr2char++ = (uint32bit >> 16)&0xff;
    *ptr2char++ = (uint32bit >>  8)&0xff;
    *ptr2char++ = (uint32bit      )&0xff;

回答2:

Just to correct your diagram for the integer: int i = 10;

// Big endian
&i <- address of i
00000000 |00000000 |00000000 |00001010 // In case of big endian

MSB---------------------------LSB


// Lower memory -----------------> higher memory


// Little endian

00001010 |00000000 |00000000 |00000000 // In case of in little endian
&i <- address of i
LSB---------------------------MSB

In little endian the Least Significant Byte (LSB) is stored in the lowest memory address.

In big endian the Most Significant Byte (MSB) is stored in the lowest memory address.

回答3:

Endianness matters in the following situations:

You're directly examining/manipulating bytes in a multi-byte type
You're serializing binary data, or transferring binary data between different architectures

Directly examining/manipulating bytes in a multi-byte type

For example, suppose you want to split out and display the binary representation of a 32-bit IEEE float. The following shows the layout of a float and the addresses of the corresponding bytes in both big- and little-endian architectures:

A        A+1      A+2      A+3        Big endian
-------- -------- -------- --------   s = sign bit
seeeeeee efffffff ffffffff ffffffff   e = exponent bit
-------- -------- -------- --------   f = fraction bit
A+3      A+2      A+1      A          Little Endian
-------- -------- -------- --------
A+1      A        A+3      A+2        "Middle" Endian (VAX)

The sign bit is in the most significant byte (MSB) of a float. On a big-endian system, the MSB is in byte A; on a little-endian system, it's in byte A+3. On some oddballs like the old VAX F float, it's stuck in the middle at byte A+1.

So if you want to mask out the sign bit, you could do something like the following:

float val = some_value();
unsigned char *p = (unsigned char *) &val; // treat val as an array of unsigned char

// Assume big-endian to begin with
int idx = 0;

if ( little_endian() )
  idx = 3;

int sign = (p[idx] & 0x80) >> 7

Serializing or transferring binary data

For another example, you want to save binary (not text) data such that it can be read by either big- or little-endian systems, or you're transferring binary data from one system to a another. The convention for Internet transfers is big-endian (MSB first), so prior to sending a message over the 'net, you'd use calls like htonl (host-to-network long) and htons (host-to-network short) to perform any necessary byte swaps prior to sending the data:

uint32_t host_value = some_value();
uint32_t network_value = htonl( host_value ); 
send( sock, &network_value, sizeof network_value, 0 );

On a little-endian system like x86, htonl will reorder the bytes of host_value from 0,1,2,3 to 3,2,1,0 and save the result to network_value. On a big-endian system, htonl is basically a no-op. The inverse operations are ntohl and ntohs.

If you're not doing anything like the above, then you generally don't have to worry about endianness at all.

回答4:

1st of all: You actually confused big- and little-endian byte order, as pointed out in @rcgldr's and @Galik's answers. The byte order is exactly vice versa, as you're showing in your sample:

00000000 | 00000000 | 00000000 | 00001010 // big endian

00001010 | 00000000 | 00000000 | 00000000 // little endian

As for your assumptions and questions:

"In This both little and big endian will give same output 10 ?"

It depends on the kind of output you're referring to.

The following code will be portable regardless of the host machines' endianess, the output is formatted text ("10") in any case:

int i = 10;

std::cout << i << std::endl;

The follwing code will not be portable. Since the values are written in binary form, the byte order will be kept verbatim:

int i = 10;

std::ofstream binfile("binaryfile.bin");
binfile.write((const char*)&i,sizeof(int));

The latter sample will not work, if the file should be read on a host machine with a different endianess.

To solve these kind of problems there's the htonl(), ntohl() function family. Usually one agrees to use network byte order (big-endian) format, to store binary data or send it over the network.

Here's a short sample, how to use the mentioned byte order conversion functions:

int i = 10;
int sendValue = htonl(i); // convert the value of i to network byte order

std::ofstream binfile("binaryfile.bin");
binfile.write((const char*)&sendValue,sizeof(int)); // write the adapted value

std::ifstream binfile("binaryfile.bin");
int recvValue = 0;
binfile.read((char*)&recvValue,sizeof(int)); // read the value in network byte order
int i = ntohl(recvValue); // convert the value of recvValue to host byte order

"Then what is the use of these both little and big endian?"

The reason (use) for the different formats is, that there are different CPU architectures, that use different ways to represent integer values in memory, depending on what's the most efficient way accessing them for their particular hardware design.
There's no worse/better for these architectural differences, that's why it's called endianess. The very origin for this coinage comes from Johnatan Swift's novel "Gulliver's travels" and was first (?) mentioned in Daniel Cohen's article "ON HOLY WARS AND A PLEA FOR PEACE".

"compiler will do it self like if int i=10 in little endian then in big endian too it is 10 as output"

Well, as you see from the exmples above, this answer was wrong.

来源：https://stackoverflow.com/questions/27383085/are-big-endian-and-little-endian-values-portable

标签

c++

memory-management

endianness