I\'m working on a Java application that needs working on very large matrices. For example multiplying two 10 million * 10 million matrices! Of course the Java heap does not
Use whatever sparse matrix algorithm applies to your data. ( on the assumption that you don't have 2.4 PB of disk space to hold 3 off 10^8 square non-sparse matrices of doubles, let alone that much RAM for an in-memory database - Blue Gene/Q 'only' has 1.6 PB .)
The complexity of matrix multiplication, if carried out naively, is O(n^3), but more efficient algorithms do exist. Anyway for a 10 millions * 10 millions matrix this is going to take a very long time and you may will face the same heap probelm but with recursivity.
If you're into complex maths you may find tool to help you in this article.
Have a look at CGL-MapReduce http://www.cs.indiana.edu/~jekanaya/cglmr.html#Matrix_Multiplication
Since this is such a huge calculation, I think you're going to run into performance problems alongside your storage problems. So I would look at parallelising this problem, and getting mutliple machines/cores to process a subset of data.
Luckily a matrix multiplication solution will decompose naturally. But I would be looking at some form of grid or distributed computing solution.
Well if you are forced to use Java and can't write the code that deals with this as native methods (that is, by telling Java to call some C code instead) then the most efficient thing to do would properly be to use a simple binary file. I would stay away from databases in this case because they are slower than direct file access and you don't need the features they offer.
Try using Memory Mapped File by storing all your data in an external file and access it via FileChannel object.
Check out this article for a brief introduction to MMF.