I\'m doing a test: compare excecution times of cgo and pure Go functions run 100 million times each. The cgo function takes longer time compared to the Golang function, and
Update for James's answer: it seems that there's no thread switch in current implementation.
See this thread on golang-nuts:
There's always going to be some overhead. It's more expensive than a simple function call but significantly less expensive than a context switch (agl is remembering an earlier implementation; we cut out the thread switch before the public release). Right now the expense is basically just having to do a full register set switch (no kernel involvement). I'd guess it's comparable to ten function calls.
See also this answer which links "cgo is not Go" blog post.
C doesn’t know anything about Go’s calling convention or growable stacks, so a call down to C code must record all the details of the goroutine stack, switch to the C stack, and run C code which has no knowledge of how it was invoked, or the larger Go runtime in charge of the program.
Thus, cgo has an overhead because it performs a stack switch, not thread switch.
It saves and restores all registers when C function is called, while it's not required when Go function or assembly function is called.
Besides that, cgo's calling conventions forbid passing Go pointers directly to C code, and common workaround is to use C.malloc, and so introduce additional allocations. See this question for details.