So I have these ginormous matrices X and Y. X and Y both have 100 million rows, and X has 10 columns. I\'m trying to implement linear regression with these matrices, and I nee
the size of X is 100e6 x 10 the size of Y is 100e6 x 1
so the final size of (X^T*X)^-1 * X^T * Y is 10 x 1
you can calculate it by following step:
a = X^T*X -> 10 x 10b = X^T*Y -> 10 x 1a^-1 * b matrixs in step 3 is very small, so you just need to do some intermediate steps to calculate 1 & 2.
For example you can read column 0 of X and Y,
and calculate it by numpy.dot(X0, Y).
for float64 dtype, the size of X0 and Y is about 1600M, if it cann't fit the memory, you can call numpy.dot twice for the first half and second half of X0 & Y separately.
So to calculate X^T*Y you need call numpy.dot 20 times,
to calculate X^T*X you need call numpy.dot 200 times.