Shader's function parameters performance

只谈情不闲聊 提交于 2019-12-05 08:09:11

According to the spec, values are always copied. For in parameters, the are copied at call time, for out parameters at return time, and for inout parameters at both call and return time.

In the language of the spec (GLSL 4.50, section 6.1.1 "Function Calling Conventions"):

All arguments are evaluated at call time, exactly once, in order, from left to right. Evaluation of an in parameter results in a value that is copied to the formal parameter. Evaluation of an out parameter results in an l-value that is used to copy out a value when the function returns. Evaluation of an inout parameter results in both a value and an l-value; the value is copied to the formal parameter at call time and the lvalue is used to copy out a value when the function returns.

An implementation is of course free to optimize anything it wants as long as the result is the same as it would be with the documented behavior. But I don't think you can expect it to work in any specify way.

For example, it wouldn't be save to pass all inout parameters by reference. Say if you had this code:

vec4 Foo(inout mat4 mat1, inout mat4 mat2) {
    mat1 = mat4(0.0);
    mat2 = mat4(1.0);
    return mat1 * vec4(1.0);
}

mat4 myMat;
vec4 res = Foo(myMat, myMat);

The correct result for this is a vector containing all 0.0 components. If the arguments were passed by reference, mat1 and mat2 inside Foo() would alias the same matrix. This means that the assignment to mat2 also changes the value of mat1, and the result is a vector with all 1.0 components. Which would be wrong.

This is of course a very artificial example, but the optimization has to be selective to work correctly in all cases.

Your first bullet point does not work when you consider arguments qualified using inout.

The real issue is what you do with the parameter inside the function, if you modify a parameter qualified with in then it cannot be "passed by reference" and a copy will have to be made. On modern hardware this probably is not a big deal, but Shader Model 2.0 was pretty limited in terms of number of temp registers and I ran into these kinds of issues more than once when GLSL and Cg first came out.

For reference, consider the following GLSL code:

vec4 DoSomething (mat4 mat, vec3 vec)
{
  // Pretty straight forward, no temporary registers are required to pass arguments.
  return vec4 (mat [0] + vec4 (vec, 0.0));
}

vec4 DoSomethingCopy (mat4 mat, vec3 vec)
{
  mat [0][0] = 0.0; // This requires the compiler to make a local copy of mat
  return vec4 (mat [0] + vec4 (vec, 0.0));
}

vec4 DoSomethingInOut (inout mat4 mat, in vec3 vec)
{
  mat [0][0] = 0.0; // No copy required, but the original mat is modified
  return vec4 (mat [0] + vec4 (vec, 0.0));
}

I cannot really comment on performance, my only bad experiences had to do with hitting actual hardware limits on older GPUs. Of course you should assume that any time something has to be copied it is going to negatively impact performance.

All shader functions are inlined (recursive function are forbidden). The concept of reference/pointer is invalid here too. The only case when some code will be generated is when you write on an input parameter. However, if the original registers aren't used anymore the compiler will probably use the same registers, and the copy (mov operation) won't be needed.

Bottom line: function invocation is free.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!