Why is my for loop code slower than an iterator?

问题

I am trying to solve the leetcode problem distribute-candies. It is easy, just find out the minimum between the candies' kinds and candies half number.

Here's my solution (cost 48ms):

use std::collections::HashSet;

pub fn distribute_candies(candies: Vec<i32>) -> i32 {
    let sister_candies = (candies.len() / 2) as i32;
    let mut kind = 0;
    let mut candies_kinds = HashSet::new();
    for candy in candies.into_iter() {
        if candies_kinds.insert(candy) {
            kind += 1;
            if kind > sister_candies {
                return sister_candies;
            }
        }
    }
    kind
}

However, I found a solution using an iterator:

use std::collections::HashSet;
use std::cmp::min;

pub fn distribute_candies(candies: Vec<i32>) -> i32 {
    min(candies.iter().collect::<HashSet<_>>().len(), candies.len() / 2) as i32
}

and it costs 36ms.

I can't quite understand why the iterator solution is faster than my for loop solution. Are there some magic optimizations that Rust is performing in the background?

回答1:

The main difference is that the iterator version internally uses Iterator::size_hint to determine how much space to reserve in the HashSet before collecting into it. This prevents repeatedly having to reallocate as the set grows.

You can do the same using HashSet::with_capacity instead of HashSet::new:

let mut candies_kinds = HashSet::with_capacity(candies.len());

In my benchmark this single change makes your code significantly faster than the iterator. However, if I simplify your code to remove the early bailout optimisation, it runs in almost exactly the same time as the iterator version.

pub fn distribute_candies(candies: &[i32]) -> i32 {
    let sister_candies = (candies.len() / 2) as i32;
    let mut candies_kinds = HashSet::with_capacity(candies.len());
    for candy in candies.into_iter() {
        candies_kinds.insert(candy);
    }
    sister_candies.min(candies_kinds.len() as i32)
}

Timings:

test tests::bench_iter                          ... bench:     262,315 ns/iter (+/- 23,704)
test tests::bench_loop                          ... bench:     307,697 ns/iter (+/- 16,119)
test tests::bench_loop_with_capacity            ... bench:     112,194 ns/iter (+/- 18,295)
test tests::bench_loop_with_capacity_no_bailout ... bench:     259,961 ns/iter (+/- 17,712)

This suggests to me that the HashSet preallocation is the dominant difference. Your additional optimisation also proves to be very effective - at least with the dataset that I happened to choose.

来源：https://stackoverflow.com/questions/54490896/why-is-my-for-loop-code-slower-than-an-iterator

标签

performance

rust