why Seq.iter is 2x faster than for loop if target is for x64?

后端未结

关注

 2  906

Disclaim: This is micro-benchmark, please do not comment quotes such as \"premature optimization is evil\" if you feel unhappy about the topic.

Examples are release targ

相关标签:

2条回答

说谎

2021-02-20 17:49
When I run the experiment on my machine (using F# 3.0 in VS 2012 in Release mode), I do not get the times you describe. Do you consistently get the same numbers when you run it repeatedly?

I tried it about 4 times and I always get numbers that are very similar. The version with Seq.iter tends to be slightly faster, but this is probably not statistically significant. Something like (using Stopwatch):
```
test(1) = 15321ms
test(2) = 5149ms
test(3) = 14290ms
test(4) = 4999ms
```
I'm running the test on a laptop with Intel Core2 Duo (2.26Ghz), using 64bit Windows 7.
0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2021-02-20 17:56
This isn't a complete answer, but hope it helps you to go further.

I can reproduce the behaviour using the same configuration. Here is a simpler example for profiling:
```
open System

let test1() =
    let ret = Array.zeroCreate 100
    let pool = {1 .. 1000000}    
    for x in pool do
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y

let test2() =
    let ret = Array.zeroCreate 100
    let pool = {1 .. 1000000}    
    Seq.iter (fun x -> 
        for _ in 1..50 do
            for y in 1..200 do
                ret.[2] <- x + y) pool

let time f =
    let sw = new Diagnostics.Stopwatch()
    sw.Start()
    let result = f() 
    sw.Stop()
    Console.WriteLine(sw.Elapsed)
    result

[<EntryPoint>]
let main argv =
    time test1
    time test2
    0
```
In this example, Seq.iter and for x in pool is executed once but there is still 2x time difference between test1 and test2:
```
00:00:06.9264843
00:00:03.6834886
```
Their ILs are very similar, so compiler optimization isn't a problem. It seems that x64 jitter fails to optimize test1 though it is able to do so with test2. Interestingly, if I refactor nested for loops in test1 as a function, JIT optimization succeeds again:
```
let body (ret: _ []) x =
    for _ in 1..50 do
        for y in 1..200 do
            ret.[2] <- x + y

let test3() =
    let ret = Array.zeroCreate 100
    let pool = {1..1000000}    
    for x in pool do
        body ret x

// 00:00:03.7012302
```
When I disable JIT optimization using the technique described here, execution times of these functions are comparable.

Why x64 jitter fails in the particular example, I don't know. You can disassemble optimized jitted code to compare ASM instructions line by line. Maybe someone with good ASM knowledge can find out their differences.
0 讨论(0)
发布评论:

提交评论
- 加载中...