Which is faster: filtering a DataSet in-memory or returning a result set from SQL Server?

☆樱花仙子☆ 提交于 2019-12-24 01:07:16

问题


I'm working on a pretty large table, (800k records and climbing) and I'd like to filter said table. The thing is, the table is stored in SQL Server. So, I was wondering, would a

SELECT * FROM table WHERE condition1=true

query be faster than loading the table to a typed DataSet and using DataRow.Find() then sending all those to another DataTable?

I'm guessing yes, but I'll ask anyway.


回答1:


As long as your SQL server is not paging because of RAM starvation the SQL Server should always be faster than loading the whole table via network and then filtering locally...




回答2:


You should definitely do it in SQL Server because:

  1. That's one of the things a database server is designed to do and is good at. If the condition always involves the same columns then you might want to consider adding indexes to speed up the lookup even more.
  2. Performance wise it is always faster to load less data
  3. Your memory consumption is going to explode when you are trying to load the whole thing into memory. Especially when the table is growing unbound. It might work ok now but can kill you unexpectedly later.



回答3:


The only exception is if you are going to be running this query over and over again. In that case, it might make sense to cache the data locally and then filter it ... but even then you should only do so if performance tests indicated a significant savings.




回答4:


I don't think there is a catch all answer really, you should test it and measure the performance for your particular circumstances.

When doing multiple complicated queries, with joins across many tables etc, we have found that it can be significantly faster to execute one large query into a dataset and then filter it how we need in memory. A big part of the performance increase for us is because it avoid us doing multiple calls to the database, which can cause a bottle neck.

There are some SQL functions e.g. if you've got to do something like WHERE LocationId IN (1, 2, 3, 4, 5, 6) which performs like way worse in SQL compared to doing dataset.LocationIds.Where(id => myValues.Contains(id)) in C#, against large datasets, so in that situation if you measured it you should find in-memory performs best.

A SELECT with simple WHERE conditions would almost certainly be faster in SQL though. Which is why you should evaluate each scenario and test and measure if needed.




回答5:


If your filter can be written as a stored procedure (you can use parameters for values in the select) it will definitely be faster because the first time you call the stored procedure its execution plan is calculated and stored. Next call, no wait for the plan to be calculated. (in your example however the plan calculation time is probably minuscule).




回答6:


I get that this question is old but the answers are all missing an important point. A database can use the filter criteria, along with table indices, to load less data from the disk. For example, if you were to do SELECT * FROM X WHERE ID = 7 then (assuming ID is a primary key here) the database will only load a single row from the disk. If you filtered that data on the server side then the database would need to load the entire table off the disk.

Assuming you are I/O bound this is going to give you significantly better performance. Of course, not all filter criteria can work in this way. It depends on the database server, indices, etc.



来源:https://stackoverflow.com/questions/6755562/which-is-faster-filtering-a-dataset-in-memory-or-returning-a-result-set-from-sq

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!