SQLite Performance Benchmark — why is :memory: so slow…only 1.5X as fast as disk?

浪子不回头ぞ 提交于 2019-11-27 00:19:17

It has to do with the fact that SQLite has a page cache. According to the Documentation, the default page cache is 2000 1K pages or about 2Mb. Since this is about 75% to 90% of your data, it isn't surprising that the two number are very similar. My guess is that in addition to the SQLite page cache, the rest of the data is still in the OS disk cache. If you got SQLite to flush the page cache (and the disk cache) you would see some really significant differences.

My question to you is, What are you trying to benchmark?

As already mentioned, SQLite's :memory: DB is just the same as the disk-based one, i.e. paged, and the only difference is that the pages are never written to disk. So the only difference between the two are the disk writes :memory: doesn't need to do (it also doesn't need to do any disk reads either, when a disk page had to be offloaded from the cache).

But read/writes from the cache may represent only a fraction of the query processing time, depending on the query. Your query has a where clause with two large sets of ids the selected rows must be members of, which is expensive.

As Cary Millsap demonstrates in his blog on optimizing Oracle (here's a representative post: http://carymillsap.blogspot.com/2009/06/profiling-with-my-boy.html), you need to understand which parts of the query processing take time. Assuming the set membership tests represented 90% of the query time, and the disk-based IO 10%, going to :memory: saves only those 10%. That's an extreme example unlikely to be representative, but I hope that it illustrates that your particular query is slanting the results. Use a simpler query, and the IO parts of the query processing will increase, and thus the benefit of :memory:.

As a final note, we've experimented with SQLite's virtual tables, where you are in charge of the actual storage, and by using C++ containers, which are typed unlike SQLite's way of storing cell values, we could see a significant improment in processing time over :memory:, but that's getting of topic a bit ;) --DD

PS: I haven't enough Karma to comment on the most popular post of this thread, so I'm commenting here :) to say that recent SQLite version don't use 1KB pages by default on Windows: http://www.sqlite.org/changes.html#version_3_6_12

You're doing SELECTs, you're using memory cache. Try to interleave SELECTs with UPDATEs.

Memory database in SQLite is actually page cache that never touches the disk. So you should forget using memory db in SQLite for performance tweaks

It's possible to turn off journal, turn off sync mode, set large page cache and you will have almost the same performance on most of operations, but durability will be lost.

From your code it's absolutely clear that you SHOULD REUSE the command and ONLY BIND parameters, because that took more than 90% of your test performance away.

Thank you for the code. I have tested in on 2 x XEON 2690 with 192GB RAM with 4 SCSI 15k hard drives in RAID 5 and the results are:

  disk | memory | qsize
-----------------------
6.3590 | 2.3280 | 15713
6.6250 | 2.3690 | 8914
6.0040 | 2.3260 | 225168
6.0210 | 2.4080 | 132388
6.1400 | 2.4050 | 264038

The speed-increase in memory is significant.

Could it be that sqlite3 isnt actually writing your data to disk from cache? which might explain why the numbers are similar.

It could also be possible that your OS is paging due to low memory avaliable?

I note you are focussing on queries that involve relatively large data sets to return. I wonder what effect you would see with smaller sets of data? To return a single row many times might require the disk to seek a lot - the random access time for memory might be much faster.

numpy arrays are slower than dict and tuple and other object sequences until you are dealing with 5 million or more objects in a sequence. You can significantly improve the speed of processing massive amounts of data by iterating over it and using generators to avoid creating and recreating temporary large objects.

numpy has become your limiting factor as it is designed to deliver linear performance. It is not a star with small or even large amounts of data. But numpy's performance does not turn into a curve as the data set grows. It remains a straight line.

Besides SQLite is just a really fast database. Faster even than most server databases. It begs the question of why anyone would use NOSQL databases when a light weight super fast fault tolerant database that uses SQL has been around and put to the test in everything from browsers to mobile phones for years.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!