Algorithm for efficient diffing of huge files

前端未结

关注

 5  1026

深忆病人 2021-01-31 05:21

I have to store two files A and B which are both very large (like 100GB). However B is likely to be similar in big parts to A so i could store A and diff(A, B). There are two in

5条回答

情书的邮戳 (楼主)

2021-01-31 05:59
You can use rdiff, which works very well with large files. Here I create a diff of two big files A and B:
1. Create a signature of one file, with e.g.
```
rdiff signature A sig.txt
```
2. using the generated signature file sig.txt and the other big file, create the delta:
```
rdiff delta sig.txt B delta
```
3. now delta contains all the information you need to recreate file B when you have both A and delta. To recreate B, run
```
rdiff patch A delta B
```
In Ubuntu, just run sudo apt-get install rdiff to install it. It is quite fast, I get about 40 MB per second on my PC. I have just tried it on a 8GB file, and the memory used by rsync was about 1MB.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...