Edit: For reference purposes (if anyone stumbles across this question), Igor Ostrovsky wrote a great post about cache misses. It discusses several differen
Well yeah that does look like it will mainly be L1 cache misses.
10 cycles for an L1 cache miss does sound about reasonable, probably a little on the low side.
A read from RAM is going to take of the order of 100s or may be even 1000s (Am too tired to attempt to do the maths right now ;)) of cycles so its still a huge win over that.