Are you writing a compiler?
And on a second note, your benchmarking probably won't work, since you have a branch in there that probably takes all the time anyway. (unless your compiler unrolls the loop for you)
Another reason that you can't benchmark a single instruction in a loop is that all your code will be cached (unlike real code). So you have taken much of the size difference between mov eax,0 and xor eax,eax out of the picture by having it in L1-cached the whole time.
My guess is that any measurable performance difference in the real world would be due to the size difference eating up the cache, and not due to execution time of the two options.