In an AI application I am writing in C++,
The cost is more or less the same than normal functions nowadays for recent CPUS, but they can't be inlined. If you call the function millions times, the impact can be significant (try calling millions of times the same function, for example, once with inline once without, and you will see it can be twice slower if the function itself does something simple; this is not a theoritical case: it is quite common for a lot of numerical computation).