Algorithm for efficient diffing of huge files

前端未结

关注

 5  990

深忆病人 2021-01-31 05:21

I have to store two files A and B which are both very large (like 100GB). However B is likely to be similar in big parts to A so i could store A and diff(A, B). There are two in

5条回答

情深已故 (楼主)

2021-01-31 06:05

Depending on your performance requirements, you could get away with sampling the chunks you fingerprint, and growing them when they match. That way you don't have to run a checksum on your entire large file.

If you need arbitrary byte alignments and you really care about performance, look at the simhash algorithm, and use it to find similar but unaligned blocks.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...