How can I minimize the data in a SQL replication

问题

I want to replicate data from a boat offshore to an onshore site. The connection is sometimes via a satellite link and can be slow and have a high latency.

Latency in our application is important, the people on-shore should have the data as soon as possible.

There is one table being replicated, consisting of an id, datetime and some binary data that may vary in length, usually < 50 bytes.

An application off-shore pushes data (hardware measurements) into the table constantly and we want these data on-shore as fast as possible.

Are there any tricks in MS SQL Server 2008 that can help to decrease the bandwith usage and decrease the latency? Initial testing uses a bandwidth of 100 kB/s.

Our alternative is to roll our own data transfer and initial prototyping here uses a bandwidth of 10 kB/s (while transferring the same data in the same timespan). This is without any reliability and integrity checks so this number is artificially low.

回答1:

You can try out different replication profiles or create your own. Different profiles are optimized for different network/bandwidth scenarios.

MSDN talks about replication profiles here.

回答2:

Have you considered getting a WAN accelerator appliance? I'm too new here to post a link, but there are several available.

Essentially, the appliance on the sending end compresses the outgoing data, and the receiving end decompresses it, all on the fly and completely invisibly. This has the benefit of increasing the apparent speed of the traffic and not requiring you to change your server configurations. It should be entirely transparent.

回答3:

I'd suggest on the fly compression/decompression outside of SQL Server. That is, SQL replicates the data normally but something in the network stack compresses so it's much smaller and bandwidth efficient.

I don't know of anything but I'm sure these exist.

Don't mess around with the SQL files directly. That's madness if not impossible.

回答4:

Do you expect it to always be only one table that is replicated? Are there many updates, or just inserts? The replication is implemented by calling an insert/update sproc on the destination for each changed row. One cheap optimization is to force the sproc name to be small. By default it is composed from the table name, but IIRC you can force a different sproc name for the article. Given an insert of around 58 bytes for a row, saving 5 or 10 characters in the sproc name is significant.

I would guess that if you update the binary field it is typically a whole replacement? If that is incorrect and you might change a small portion, you could roll your own diff patching mechanism. Perhaps a second table that contains a time series of byte changes to the originals. Sounds like a pain, but could have huge savings of bandwidth changes depending on your workload.

Are the inserts generally done in logical batches? If so, you could store a batch of inserts as one customized blob in a replicated table, and have a secondary process that unpacks them into the final table you want to work with. This would reduce the overhead of these small rows flowing through replication.

来源：https://stackoverflow.com/questions/910008/how-can-i-minimize-the-data-in-a-sql-replication

标签

sql-server

performance

replication