How to eliminate time spent in JIT_TailCall for functions that are genuinely non-recursive

时间秒杀一切 提交于 2019-12-10 03:42:10

问题


I am writing a 64 bit F# solution and profiling has revealed a surprisingly & unexpectedly large amount of time spent in JIT_TailCall...it is in fact dominating the runtime (circa 80%). This appears together with its evil cousin JIT_TailCallHelperStub_ReturnAddress.

I have definitely traced the source to passing a struct type (custom value types) in a method or property call across an assembly boundary. I am certain of this because if I bypass the method call and assign my struct to the property directly (the one the offending method was using) the performance magically improves by a factor of 4-5 x less runtime!

The calling assembly is using F# 3.1 because it is being dynamically compiled with the latest stable release of FSharp.Compiler.Services.

The assembly being called is using F# 4.0 / .NET 4.6 (VS 2015).

UPDATE

A simplification of what I am trying to do is to assign a custom struct value to a position in an array from a dynamically generated assembly...

Runtime is fast and no extraneous tail calls are generated when calling:

  1. A property exposing the private array in the type

However, the runtime is slow due to extraneous tail calls being generated when calling:

  1. An indexer property exposing the array (Item)

  2. A member method acting as a setter for the array

The reason I need to call the member method is that I need to perform a few checks prior to insertion of the item in the array.

PRACTICAL

Over and above understanding the root of the issue, I would like to know whether F# 4.0 and by implication the coming release of FSharp.Compiler.Services would solve this issue. Given that the updated FSharp.Compiler.Services is relatively imminent, it may then just be best to wait.


回答1:


I posted this on your GitHub question, but cross-post it here so it is easier to find:

I have a case when mutually recursive functions generate 30% load for JIT_TailCall and 15% load for JIT_TailCallHelperStub_ReturnAddress. These functions are closed over method variables and class fields. When I turn off tail call generation, my performance increases exactly by 45%.

I haven't profiled this snippet, but this is how my real code is structured:

#time "on"
type MyRecType() = 

  let list = System.Collections.Generic.List()

  member this.DoWork() =
    let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
    let returnTask = tcs.Task // NB! must access this property first
    let mutable local = 1

    let rec outerLoop() =
      if local < 1000000 then
        innerLoop(1)
      else
        tcs.SetResult(local)
        ()

    and innerLoop(inc:int) =
      if local % 2 = 0 then
        local <- local + inc
        outerLoop()
      else
        list.Add(local) // just fake access to a field to illustrate the pattern
        local <- local + 1
        innerLoop(inc)

    outerLoop()

    returnTask


let instance = MyRecType()

instance.DoWork().Result

> Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

.NET 4.6 and F# 4.0 don't help at all.

I tried to rewrite this as methods, but got StackOverflowException. However, I do not understand why I am not getting SO when I run a very big number of iterations without tail call generation?

Update Rewriting the method as:

  member this.DoWork2() =
    let mutable tcs = (System.Runtime.CompilerServices.AsyncTaskMethodBuilder<int>.Create())
    let returnTask = tcs.Task // NB! must access this property first
    let mutable local = 1
    let rec loop(isOuter:bool, inc:int) =
      if isOuter then
        if local < 1000000 then
          loop(false,1)
        else
          tcs.SetResult(local)
          ()
      else
        if local % 2 = 0 then
          local <- local + inc
          loop(true,1)
        else
          list.Add(local) // just fake access to a field to illustrate the pattern
          local <- local + 1
          loop(false,1)

    loop(true,1)

    returnTask


> Real: 00:00:00.004, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
> val it : int = 1000001

reduces JIT_TailCall and JIT_TailCallHelperStub_ReturnAddress overhead to 18% an 2% of execution time that is 2x faster, so the actual overhead decreased from 45% to 10% of the initial time. Still high, but not as dismal as in the first scenario.



来源:https://stackoverflow.com/questions/31433605/how-to-eliminate-time-spent-in-jit-tailcall-for-functions-that-are-genuinely-non

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!