HyperSQL (HSQLDB): massive insert performance

大兔子大兔子 提交于 2019-12-03 09:02:40

With CACHED tables, disk IO is taking most of the time. There is no need for multiple threads because you are inserting into the same table. One thing that noticably improves performance is the reuse of a single parameterized PreparedStatment, setting the parameters for each row insert.

On your machine, you can improve IO significantly by using a large NIO limit for memory-mapped IO. For example SET FILES NIO SIZE 8192. A 64 bit JVM is required for larger sizes to have an effect.

http://hsqldb.org/doc/2.0/guide/management-chapt.html

To reduce IO for the duration of the bulk insert use SET FILES LOG FALSE and do not perform a checkpoint until the end of the insert. The details are discussed here:

http://hsqldb.org/doc/2.0/guide/deployment-chapt.html#dec_bulk_operations

UPDATE: An insert test with 16 million rows below resulted in a 1.9 GigaByte .data file and took just a few minutes on an average 2 core processor and 7200 RPM disk. The key is large NIO allocation.

connection time -- 47
complete setup time -- 78 ms
insert time for 16384000 rows -- 384610 ms -- 42598 tps
shutdown time  -- 38109 

check what your application is doing. First things would be to look at resource utilization in taskmanager (or OS specific comparable) and visualvm.

Good candidates for causing bad performance:

  • disk IO
  • garbage collector

H2Database may give you slightly better performance than HSQLDB (while maintaining syntax compatibility).

In any case, you might want to try using a higher delay for syncing to disk to reduce random access disk I/O. (ie. SET WRITE_DELAY <num>)

Hopefully you're doing bulk INSERT statements, rather than a single insert per row. If not, do that if possible.

Depending on your application requirements, you might be better off with a key-value store than an RDBMS. (Do you regularly need to insert 1.3*10^7 entries?)

Your main limiting factor is going to be random access operations to disk. I highly doubt that anything you're doing will be CPU-bound. (Take a look at top, then compare it to iotop!)

With so many records, maybe you could consider switching to a NoSQL DB. It depends on the nature/format of the data you need to store, of course.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!