I have to store two files A and B which are both very large (like 100GB). However B is likely to be similar in big parts to A so i could store A and diff(A, B). There are two in
Take a look at RSYNCs algorithm, as it's designed pretty much to do exactly this so it can efficiently copy deltas. And the algorithm is pretty well documented, as I recall.