“Streaming” read of over 10 million rows from a table in SQL Server

跟風遠走 提交于 2019-12-07 12:22:52

问题


What is the best strategy to read millions of records from a table (in SQL Server 2012, BI instance), in a streaming fashion (like SQL Server Management Studio does)?

I need to cache these records locally (C# console application) for further processing.

Update - Sample code that works with SqlDataReader

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data;
using System.Data.SqlClient;
using System.Threading;


namespace ReadMillionsOfRows
{
    class Program
    {
        static ManualResetEvent done = new ManualResetEvent(false);


        static void Main(string[] args)
        {

          Process();
          done.WaitOne();
        }

        public static async Task Process()
        {
            string connString = @"Server=;Database=;User Id=;Password=;Asynchronous Processing=True";
            string sql = "Select * from tab_abc";

            using (SqlConnection conn = new SqlConnection(connString))
            {
                await conn.OpenAsync();
                using (SqlCommand comm = new SqlCommand(sql))
                {
                    comm.Connection = conn;
                    comm.CommandType = CommandType.Text;

                    using (SqlDataReader reader = await comm.ExecuteReaderAsync())
                    {
                        while (await reader.ReadAsync())
                        {
                            //process it here
                        }
                    }
                }
            }

            done.Set();
        }

    }
}

回答1:


Use a SqlDataReader it is forward only and fast. It will only hold a reference to a record while it is in the scope of reading it.




回答2:


That depends on what your cache looks like. If you're going to store everything in memory, and a DataSet is approriate as a cache, just read everything to the DataSet.

If not, use the SqlDataReader as suggested above, read the records one by one storing them in your big cache.

Do note, however, that there's already a very popular caching mechanism for large database tables - your database. With the proper index configuration, the database can probably outperform your cache.




回答3:


You can use Entity Framework and paginate the select using Take and Skip to fetch the rows by buffer. If you need in memory caching for such a large dataset I would suggest using GC.GetTotalMemory in order to test if there is any free memory left.



来源:https://stackoverflow.com/questions/13045481/streaming-read-of-over-10-million-rows-from-a-table-in-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!