C, Little and Big Endian confusion

狂风中的少年 提交于 2019-12-08 10:43:37

问题


I try to understand C programming memory Bytes order, but I'm confuse. I try my app with some value on this site for my output verification : www.yolinux.com/TUTORIALS/Endian-Byte-Order.html

For the 64bits value I use in my C program:

volatile long long ll = (long long)1099511892096;
__mingw_printf("\tlong long, %u Bytes, %u bits,\t%lld to %lli, %lli, 0x%016llX\n", sizeof(long long), sizeof(long long)*8, LLONG_MIN, LLONG_MAX , ll, ll);

void printBits(size_t const size, void const * const ptr)
{
    unsigned char *b = (unsigned char*) ptr;
    unsigned char byte;
    int i, j;
    printf("\t");
    for (i=size-1;i>=0;i--)
    {
        for (j=7;j>=0;j--)
        {
            byte = b[i] & (1<<j);
            byte >>= j;
            printf("%u", byte);
        }

        printf(" ");
    }
    puts("");
}

Out

long long,                8 Bytes,   64 bits,   -9223372036854775808 to 9223372036854775807, 1099511892096, 0x0000010000040880
80 08 04 00 00 01 00 00  (Little-Endian)
10000000 00001000 00000100 00000000 00000000 00000001 00000000 00000000
00 00 01 00 00 04 08 80  (Big-Endian)
00000000 00000000 00000001 00000000 00000000 00000100 00001000 10000000

Tests

0x8008040000010000, 1000000000001000000001000000000000000000000000010000000000000000 // online website hex2bin conv. 
                    1000000000001000000001000000000000000000000000010000000000000000 // my C app
0x8008040000010000, 1000010000001000000001000000000000000100000000010000000000000000 // yolinux.com


0x0000010000040880, 0000000000000000000000010000000000000000000001000000100010000000      //online website hex2bin conv., 1099511892096  ! OK
                    0000000000000000000000010000000000000000000001000000100010000000      // my C app,  1099511892096 ! OK
[Convert]::ToInt64("0000000000000000000000010000000000000000000001000000100010000000", 2) // using powershell for other verif., 1099511892096 ! OK          
0x0000010000040880, 0000000000000000000000010000010000000000000001000000100010000100      // yolinux.com, 1116691761284 (from powershell bin conv.) ! BAD !

Problem

yolinux.com website announce 0x0000010000040880 for BIG ENDIAN ! But my computer use LITTLE ENDIAN I think (Intel proc.) and I get same value 0x0000010000040880 from my C app and from another website hex2bin converter. __mingw_printf(...0x%016llX...,...ll) also print 0x0000010000040880 as you can see.

Following yolinux website I have inverted my "(Little-Endian)" and "(Big-Endian)" labels in my output for the moment.

Also, the sign bit must be 0 for a positive number it's the case on my result but also yolinux result.(can not help me to be sure.)

If I correctly understand Endianness only Bytes are swapped not bits and my groups of bits seems to be correctly inverted.

It is simply an error on yolinux.com or is I missing a step about 64-bit numbers and C programming?


回答1:


When you print some "multi-byte" integer using printf (and the correct format specifier) it doesn't matter whether the system is little or big endian. The result will be the same.

The difference between little and big endian is the order that multi-byte types are stored in memory. But once data is read from memory into the core processor, there is no difference.

This code shows how an integer (4 bytes) is placed in memory on my machine.

#include <stdio.h>

int main()
{
    unsigned int u = 0x12345678;
    printf("size of int is %zu\n", sizeof u);
    printf("DEC: u=%u\n", u);
    printf("HEX: u=0x%x\n", u);
    printf("memory order:\n");
    unsigned char * p = (unsigned char *)&u;
    for(int i=0; i < sizeof u; ++i) printf("address %p holds %x\n", (void*)&p[i], p[i]);
    return 0;
}

Output:

size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 78
address 0x7ffddf2c263d holds 56
address 0x7ffddf2c263e holds 34
address 0x7ffddf2c263f holds 12

So I can see that I'm on a little endian machine as the LSB (least significant byte, i.e. 78) is stored on the lowest address.

Executing the same program on a big endian machine would (assuming same address) show:

size of int is 4
DEC: u=305419896
HEX: u=0x12345678
memory order:
address 0x7ffddf2c263c holds 12 
address 0x7ffddf2c263d holds 34 
address 0x7ffddf2c263e holds 56 
address 0x7ffddf2c263f holds 78 

Now it is the MSB (most significant byte, i.e. 12) that are stored on the lowest address.

The important thing to understand is that this only relates to "how multi-byte type are stored in memory". Once the integer is read from memory into a register inside the core, the register will hold the integer in the form 0x12345678 on both little and big endian machines.




回答2:


There is only a single way to represent an integer in decimal, binary or hexadecimal format. For example, number 43981 is equal to 0xABCD when written as hexadecimal, or 0b1010101111001101 in binary. Any other value (0xCDAB, 0xDCBA or similar) represents a different number.

The way your compiler and cpu choose to store this value internally is irrelevant as far as C standard is concerned; the value could be stored as a 36-bit one's complement if you're particularly unlucky, as long as all operations mandated by the standard have equivalent effects.

You will rarely have to inspect your internal data representation when programming. Practically the only time when you care about endiannes is when working on a communication protocol, because then the binary format of the data must be precisely defined, but even then your code will not be different regardless of the architecture:

// input value is big endian, this is defined
// by the communication protocol

uint32_t parse_comm_value(const char * ptr)
{
     // but bit shifts in C have the same
     // meaning regardless of the endianness
     // of your architecture

     uint32_t result = 0;
     result |= (*ptr++) << 24;
     result |= (*ptr++) << 16;
     result |= (*ptr++) << 8;
     result |= (*ptr++);
     return result;
}

Tl;dr calling a standard function like printf("0x%llx", number); always prints the correct value using the specified format. Inspecting the contents of memory by reading individual bytes gives you the representation of the data on your architecture.



来源:https://stackoverflow.com/questions/54548061/c-little-and-big-endian-confusion

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!