How to achive more 10 inserts per second with azure storage tables

前端 未结 1 980
暖寄归人
暖寄归人 2020-12-08 09:01

I write simple WorkerRole that add test data in to table. The code of inserts is like this.

var TableClient = this.StorageAccount.CreateCloudTableClient();
T         


        
相关标签:
1条回答
  • 2020-12-08 09:08

    To speed things up you should use batch transactions (Entity Group Transactions), allowing you to commit up to 100 items within a single request:

    foreach (var item in myItemsToAdd)
    {
        this.Context.AddObject(TableName, item);
    }
    this.Context.SaveChanges(SaveChangesOptions.Batch);
    

    You can combine this with Partitioner.Create (+ AsParallel) to send multiple requests on different threads/cores per batch of 100 items to make things really fast.

    But before doing all of this, read through the limitations of using batch transactions (100 items, 1 partition per transaction, ...).

    Update:

    Since you can't use transactions here are some other tips. Take a look at this MSDN thread about improving performance when using table storage. I wrote some code to show you the difference:

        private static void SequentialInserts(CloudTableClient client)
        {
            var context = client.GetDataServiceContext();
            Trace.WriteLine("Starting sequential inserts.");
    
            var stopwatch = new Stopwatch();
            stopwatch.Start();
    
            for (int i = 0; i < 1000; i++)
            {
                Trace.WriteLine(String.Format("Adding item {0}. Thread ID: {1}", i, Thread.CurrentThread.ManagedThreadId));
                context.AddObject(TABLENAME, new MyEntity()
                {
                    Date = DateTime.UtcNow,
                    PartitionKey = "Test",
                    RowKey = Guid.NewGuid().ToString(),
                    Text = String.Format("Item {0} - {1}", i, Guid.NewGuid().ToString())
                });
                context.SaveChanges();
            }
    
            stopwatch.Stop();
            Trace.WriteLine("Done in: " + stopwatch.Elapsed.ToString());
        }
    

    So, the first time I run this I get the following output:

    Starting sequential inserts.
    Adding item 0. Thread ID: 10
    Adding item 1. Thread ID: 10
    ..
    Adding item 999. Thread ID: 10
    Done in: 00:03:39.9675521
    

    It takes more than 3 minutes to add 1000 items. Now, I changed the app.config based on the tips on the MSDN forum (maxconnection should be 12 * number of CPU cores):

      <system.net>
        <settings>
          <servicePointManager expect100Continue="false" useNagleAlgorithm="false"/>
        </settings>
        <connectionManagement>
          <add address = "*" maxconnection = "48" />
        </connectionManagement>
      </system.net>
    

    And after running the application again I get this output:

    Starting sequential inserts.
    Adding item 0. Thread ID: 10
    Adding item 1. Thread ID: 10
    ..
    Adding item 999. Thread ID: 10
    Done in: 00:00:18.9342480
    

    From over 3 minutes to 18 seconds. What a difference! But we can do even better. Here is some code inserts all items using a Partitioner (inserts will happen in parallel):

        private static void ParallelInserts(CloudTableClient client)
        {            
            Trace.WriteLine("Starting parallel inserts.");
    
            var stopwatch = new Stopwatch();
            stopwatch.Start();
    
            var partitioner = Partitioner.Create(0, 1000, 10);
            var options = new ParallelOptions { MaxDegreeOfParallelism = 8 };
    
            Parallel.ForEach(partitioner, options, range =>
            {
                var context = client.GetDataServiceContext();
                for (int i = range.Item1; i < range.Item2; i++)
                {
                    Trace.WriteLine(String.Format("Adding item {0}. Thread ID: {1}", i, Thread.CurrentThread.ManagedThreadId));
                    context.AddObject(TABLENAME, new MyEntity()
                    {
                        Date = DateTime.UtcNow,
                        PartitionKey = "Test",
                        RowKey = Guid.NewGuid().ToString(),
                        Text = String.Format("Item {0} - {1}", i, Guid.NewGuid().ToString())
                    });
                    context.SaveChanges();
                }
            });
    
            stopwatch.Stop();
            Trace.WriteLine("Done in: " + stopwatch.Elapsed.ToString());
        }
    

    And the result:

    Starting parallel inserts.
    Adding item 0. Thread ID: 10
    Adding item 10. Thread ID: 18
    Adding item 999. Thread ID: 16
    ..
    Done in: 00:00:04.6041978
    

    Voila, from 3m39s we dropped to 18s and now we even dropped to 4s.

    0 讨论(0)
提交回复
热议问题