How Do I Attain Peak CPU Performance With Dot Product?
问题 Problem I have been studying HPC, specifically using matrix multiplication as my project (see my other posts in profile). I achieve good performance in those, but not good enough. I am taking a step back to see how well I can do with a dot product calculation. Dot Product vs. Matrix Multiplication The dot product is simpler, and will allow me to test HPC concepts without dealing with packing and other related issues. Cache blocking is still an issue, which forms my second question. Algorithm