In which data segment is the C string stored?

妖精的绣舞 提交于 2021-02-17 21:35:46

问题


I'm wondering what's the difference between char s[] = "hello" and char *s = "hello".

After reading this and this, I'm still not very clear on this question.


As I know, there are five data segments in memory, Text, BSS, Data, Stack and Heap.

From my understanding,

in case of char s[] = "hello":

  1. "hello" is in Text.
  2. s is in Data if it is a global variable or in Stack if it is a local variable.

  3. We also have a copy of "hello" where the s is stored, so we can modify the value of this string via s.

in case of char *s = "hello":

  1. "hello" is in Text.
  2. s is in Data if it is a global variable or in Stack if it is a local variable.
  3. s just points to "hello" in Text and we don't have a copy of it, therefore modifying the value of string via this pointer should cause "Segmentation Fault".

Am I right?


回答1:


You are right that "hello" for the first case is mutable and for the second case is immutable string. And they are kept in read-only memory before initialization.

In the first case the mutable memory is initialized/copied from immutable string. In the second case the pointer refers to immutable string.

For first case wikipedia says,

The values for these variables are initially stored within the read-only memory (typically within .text) and are copied into the .data segment during the start-up routine of the program.

Let us examine segment.c file.

char*s = "hello"; // string
char sar[] = "hello"; // string array
char content[32];

int main(int argc, char*argv[]) {
        char psar[] = "parhello"; // local/private string array
        char*ps = "phello"; // private string
        content[0] = 1;
        sar[3] = 1; // OK
        // sar++; // not allowed
        // s[2] = 1; // segmentation fault
        s = sar;
        s[2] = 1; // OK
        psar[3] = 1; // OK
        // ps[2] = 1; // segmentation fault
        ps = psar;
        ps[2] = 1; // OK
        return 0;
}

Here is the assembly generated for segment.c file. Note that both s and sar is in global aka .data segment. It seems sar is const pointer to a mutable initialized memory or not pointer at all(practically it is an array). And eventually it has an implication that sizeof(sar) = 6 is different to sizeof(s) = 8. There are "hello" and "phello" in readonly(.rodata) section and effectively immutable.

    .file   "segment.c"
    .globl  s
    .section    .rodata
.LC0:
    .string "hello"
    .data
    .align 8
    .type   s, @object
    .size   s, 8
s:
    .quad   .LC0
    .globl  sar
    .type   sar, @object
    .size   sar, 6
sar:
    .string "hello"
    .comm   content,32,32
    .section    .rodata
.LC1:
    .string "phello"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $64, %rsp
    movl    %edi, -52(%rbp)
    movq    %rsi, -64(%rbp)
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    movl    $1752326512, -32(%rbp)
    movl    $1869376613, -28(%rbp)
    movb    $0, -24(%rbp)
    movq    $.LC1, -40(%rbp)
    movb    $1, content(%rip)
    movb    $1, sar+3(%rip)
    movq    $sar, s(%rip)
    movq    s(%rip), %rax
    addq    $2, %rax
    movb    $1, (%rax)
    movb    $1, -29(%rbp)
    leaq    -32(%rbp), %rax
    movq    %rax, -40(%rbp)
    movq    -40(%rbp), %rax
    addq    $2, %rax
    movb    $1, (%rax)
    movl    $0, %eax
    movq    -8(%rbp), %rdx
    xorq    %fs:40, %rdx
    je  .L2
    call    __stack_chk_fail
.L2:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section    .note.GNU-stack,"",@progbits

Again for local variable in main, the compiler does not bother to create a name. And it may keep it in register or in stack memory.

Note that local variable value "parhello" is optimized into 1752326512 and 1869376613 numbers. I discovered it by changing the value of "parhello" to "parhellp". The diff of the assembly output is as follows,

39c39
<   movl    $1886153829, -28(%rbp)
---
>   movl    $1869376613, -28(%rbp)

So there is no separate immutable store for psar . It is turned into integers in the code segment.




回答2:


answer to your first question:

char s[] = "hello";

s is an array of type char. An array is a const pointer, meaning that you cannot change the s using pointer arithmetic (i.e. s++). The data aren't const, though, so you can change it.
See this example C code:

#include <stdio.h>

void reverse(char *p){
    char c;
    char* q = p;
    while (*q) q++; 
    q--; // point to the end
    while (p < q) {
        c = *p;
        *p++ = *q;
        *q-- = c;
    }
}

int main(){
    char s[]  = "DCBA";
    reverse( s);
    printf("%s\n", s); // ABCD
}

which reverses the text "DCBA" and produces "ABCD".

char *p = "hello"

p is a pointer to a char. You can do pointer arithmetic -- p++ will compile -- and puts data in read-only parts of the memory (const data).
and using p[0]='a'; will result to runtime error:

#include <stdio.h>
int main(){
    char* s  = "DCBA";  
    s[0]='D'; // compile ok but runtime error
    printf("%s\n", s); // ABCD
}  

this compiles, but not runs.

const char* const s = "DCBA";

With a const char* const, you can change neither s nor the data content which point to (i.e. "DCBE"). so data and pointer are const:

#include <stdio.h>
int main(){
    const char* const s  = "DCBA";  
    s[0]='D'; // compile error
    printf("%s\n", s); // ABCD
}

The Text segment is normally the segment where your code is stored and is const; i.e. unchangeable. In embedded systems, this is the ROM, PROM, or flash memory; in a desktop computer, it can be in RAM.

The Stack is RAM memory used for local variables in functions.

The Heap is RAM memory used for global variables and heap-initialized data.

BSS contains all global variables and static variables that are initialized to zero or not initialized vars.

For more information, see the relevant Wikipedia and this relevant Stack Overflow question

With regards to s itself: The compiler decides where to put it (in stack space or CPU registers).

For more information about memory protection and access violations or segmentation faults, see the relevant Wikipedia page

This is a very broad topic, and ultimately the exact answers depend on your hardware and compiler.



来源:https://stackoverflow.com/questions/37902489/in-which-data-segment-is-the-c-string-stored

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!