Calling setns from Go returns EINVAL for mnt namespace

会有一股神秘感。 提交于 2019-12-31 13:55:11

问题


The C code works fine and correctly enters the namespace, but the Go code always seems to return EINVAL from the setns call to enter the mnt namespace. I've tried a number of permutations (including embedded C code with cgo and external .so) on Go 1.2, 1.3 and the current tip.

Stepping through the code in gdb shows that both sequences are calling setns in libc the exact same way (or so it appears to me).

I have boiled what seems to be the issue down to the code below. What am I doing wrong?

Setup

I have a shell alias for starting quick busybox containers:

alias startbb='docker inspect --format "{{ .State.Pid }}" $(docker run -d busybox sleep 1000000)'

After running this, startbb will start a container and output it's PID.

lxc-checkconfig outputs:

Found kernel config file /boot/config-3.8.0-44-generic
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: missing
Network namespace: enabled
Multiple /dev/pts instances: enabled

--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: missing
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled

uname -a produces:

Linux gecko 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Working C code

The following C code works fine:

#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>

main(int argc, char* argv[]) {
    int i;
    char nspath[1024];
    char *namespaces[] = { "ipc", "uts", "net", "pid", "mnt" };

    if (geteuid()) { fprintf(stderr, "%s\n", "abort: you want to run this as root"); exit(1); }

    if (argc != 2) { fprintf(stderr, "%s\n", "abort: you must provide a PID as the sole argument"); exit(2); }

    for (i=0; i<5; i++) {
        sprintf(nspath, "/proc/%s/ns/%s", argv[1], namespaces[i]);
        int fd = open(nspath, O_RDONLY);

        if (setns(fd, 0) == -1) { 
            fprintf(stderr, "setns on %s namespace failed: %s\n", namespaces[i], strerror(errno));
        } else {
            fprintf(stdout, "setns on %s namespace succeeded\n", namespaces[i]);
        }

        close(fd);
    }
}

After compiling with gcc -o checkns checkns.c, the output of sudo ./checkns <PID> is:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace succeeded

Failing Go code

Conversely, the following Go code (which should be identical) doesn't work quite as well:

package main

import (
    "fmt"
    "os"
    "path/filepath"
    "syscall"
)

func main() {
    if syscall.Geteuid() != 0 {
        fmt.Println("abort: you want to run this as root")
        os.Exit(1)
    }

    if len(os.Args) != 2 {
        fmt.Println("abort: you must provide a PID as the sole argument")
        os.Exit(2)
    }

    namespaces := []string{"ipc", "uts", "net", "pid", "mnt"}

    for i := range namespaces {
        fd, _ := syscall.Open(filepath.Join("/proc", os.Args[1], "ns", namespaces[i]), syscall.O_RDONLY, 0644)
        err, _, msg := syscall.RawSyscall(308, uintptr(fd), 0, 0) // 308 == setns

        if err != 0 {
            fmt.Println("setns on", namespaces[i], "namespace failed:", msg)
        } else {
            fmt.Println("setns on", namespaces[i], "namespace succeeded")
        }

    }
}

Instead, running sudo go run main.go <PID> produces:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace failed: invalid argument

回答1:


(There is an issue filed on the Go project)

So, the answer to this question is that you have to call setns from a single-threaded context. This makes sense since setns should join the current thread to the namespace. Since Go is multi-threaded, you need to make the setns call before the Go runtime threads start.

I think this is because the thread in which the call to syscall.RawSyscall executes is not the main thread -- even with runtime.LockOSThread the result is not what you would expect (ie. that the goroutine is "locked" to the main C thread and therefore equivalent to the constructor trick explained below).

The reply I got after filing the issue suggested using "the cgo constructor trick". I couldn't find any "proper" documentation on this "trick", but it is used in nsinit by Docker/Michael Crosby and even though I went over that code line by line, I didn't try running it this way (see below for frustration).

The "trick" is basically that you can get cgo to execute a C function prior to starting the Go runtime.

To do this, you add the __attribute__((constructor)) macro to decorate the function you want to run before Go starts up:

/*
__attribute__((constructor)) void init() {
    // this code will execute before Go starts up
    // in runs in a single-threaded C context
    // before Go's threads start running
}
*/
import "C"

Using this as a template, I modified checkns.go like this:

/*
#include <sched.h>
#include <stdio.h>
#include <fcntl.h>

__attribute__((constructor)) void enter_namespace(void) {
   setns(open("/proc/<PID>/ns/mnt", O_RDONLY, 0644), 0);
}
*/
import "C"

... rest of file is unchanged ...

This code works, but requires the PID to be hardcoded since it's not being read properly from the commandline input, but it illustrates the idea (and works if you provide a PID from a container started as described above).

It's frustrating because I wanted call setns multiple times but since this C code executes before the Go runtime starts, there is no Go code available.

Update: Shlepping around in the kernel mailing lists provides this link to a conversation that documents this. I can't seem to find it in any actually published manpages, but here's the quote from a patch to setns(2), confirmed by Eric Biederman:

A process may not be reassociated with a new mount namespace if it is multi-threaded. Changing the mount namespace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities in its own user namespace and CAP_SYS_ADMIN in the target mount namespace.



来源:https://stackoverflow.com/questions/25704661/calling-setns-from-go-returns-einval-for-mnt-namespace

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!