Visual Studio includes support for __forceinline. The Microsoft Visual Studio 2005 documentation states:
The __forceinline keyword overrides the cost/benefit analysis and relies on the judgment of the programmer instead.
This raises the question: When is the compiler's cost/benefit analysis wrong? And, how am I supposed to know that it's wrong?
In what scenario is it assumed that I know better than my compiler on this issue?
The compiler is making its decisions based on static code analysis, whereas if you profile as don says, you are carrying out a dynamic analysis that can be much farther reaching. The number of calls to a specific piece of code is often largely determined by the context in which it is used, e.g. the data. Profiling a typical set of use cases will do this. Personally, I gather this information by enabling profiling on my automated regression tests. In addition to forcing inlines, I have unrolled loops and carried out other manual optimizations on the basis of such data, to good effect. It is also imperative to profile again afterwards, as sometimes your best efforts can actually lead to decreased performance. Again, automation makes this a lot less painful.
More often than not though, in my experience, tweaking alogorithms gives much better results than straight code optimization.
You know better than the compiler only when your profiling data tells you so.
The one place I am using it is licence verification.
One important factor to protect against easy* cracking is to verify being licenced in multiple places rather than only one, and you don't want these places to be the same function call.
*) Please don't turn this in a discussion that everything can be cracked - I know. Also, this alone does not help much.
I've developed software for limited resource devices for 9 years or so and the only time I've ever seen the need to use __forceinline was in a tight loop where a camera driver needed to copy pixel data from a capture buffer to the device screen. There we could clearly see that the cost of a specific function call really hogged the overlay drawing performance.
The only way to be sure is to measure performance with and without. Unless you are writing highly performance critical code, this will usually be unnecessary.
The inline directive will be totally of no use when used for functions which are:
recursive, long, composed of loops,
If you want to force this decision using __forceinline
Actually, even with the __forceinline keyword. Visual C++ sometimes chooses not to inline the code. (Source: Resulting assembly source code.)
Always look at the resulting assembly code where speed is of importance (such as tight inner loops needed to be run on each frame).
Sometimes using #define instead of inline will do the trick. (of course you loose a lot of checking by using #define, so use it only when and where it really matters).
For SIMD code.
SIMD code often uses constants/magic numbers. In a regular function, every const __m128 c = _mm_setr_ps(1,2,3,4); becomes a memory reference.
With __forceinline, compiler can load it once and reuse the value, unless your code exhausts registers (usually 16).
CPU caches are great but registers are still faster.
P.S. Just got 12% performance improvement by __forceinline alone.
There are several situations where the compiler is not able to determine categorically whether it is appropriate or beneficial to inline a function. Inlining may involve trade-off's that the compiler is unwilling to make, but you are (e.g,, code bloat).
In general, modern compilers are actually pretty good at making this decision.
When you know that the function is going to be called in one place several times for a complicated calculation, then it is a good idea to use __forceinline. For instance, a matrix multiplication for animation may need to be called so many times that the calls to the function will start to be noticed by your profiler. As said by the others, the compiler can't really know about that, especially in a dynamic situation where the execution of the code is unknown at compile time.
Actually, boost is loaded with it.
For example
 BOOST_CONTAINER_FORCEINLINE flat_tree&  operator=(BOOST_RV_REF(flat_tree) x)
    BOOST_NOEXCEPT_IF( (allocator_traits_type::propagate_on_container_move_assignment::value ||
                        allocator_traits_type::is_always_equal::value) &&
                         boost::container::container_detail::is_nothrow_move_assignable<Compare>::value)
 {  m_data = boost::move(x.m_data); return *this;  }
 BOOST_CONTAINER_FORCEINLINE const value_compare &priv_value_comp() const
 { return static_cast<const value_compare &>(this->m_data); }
 BOOST_CONTAINER_FORCEINLINE value_compare &priv_value_comp()
 { return static_cast<value_compare &>(this->m_data); }
 BOOST_CONTAINER_FORCEINLINE const key_compare &priv_key_comp() const
 { return this->priv_value_comp().get_comp(); }
 BOOST_CONTAINER_FORCEINLINE key_compare &priv_key_comp()
 { return this->priv_value_comp().get_comp(); }
 public:
 // accessors:
 BOOST_CONTAINER_FORCEINLINE Compare key_comp() const
 { return this->m_data.get_comp(); }
 BOOST_CONTAINER_FORCEINLINE value_compare value_comp() const
 { return this->m_data; }
 BOOST_CONTAINER_FORCEINLINE allocator_type get_allocator() const
 { return this->m_data.m_vect.get_allocator(); }
 BOOST_CONTAINER_FORCEINLINE const stored_allocator_type &get_stored_allocator() const
 {  return this->m_data.m_vect.get_stored_allocator(); }
 BOOST_CONTAINER_FORCEINLINE stored_allocator_type &get_stored_allocator()
 {  return this->m_data.m_vect.get_stored_allocator(); }
 BOOST_CONTAINER_FORCEINLINE iterator begin()
 { return this->m_data.m_vect.begin(); }
 BOOST_CONTAINER_FORCEINLINE const_iterator begin() const
 { return this->cbegin(); }
 BOOST_CONTAINER_FORCEINLINE const_iterator cbegin() const
 { return this->m_data.m_vect.begin(); }
来源:https://stackoverflow.com/questions/151919/when-should-i-use-forceinline-instead-of-inline