Database that can handle >500 millions rows

后端未结

关注

 9  1558

半阙折子戏 2020-12-01 01:23

I am looking for a database that could handle (create an index on a column in a reasonable time and provide results for select queries in less than 3 sec) more than

9条回答

萌比男神i (楼主)

2020-12-01 01:54

The following article discusses the import and use of a 16 billion row table in Microsoft SQL. https://www.itprotoday.com/big-data/adventures-big-data-how-import-16-billion-rows-single-table.

From the article:

Here are some distilled tips from my experience:

The more data you have in a table with a defined clustered index, the slower it becomes to import unsorted records into it. At some point, it becomes too slow to be practical. If you want to export your table to the smallest possible file, make it native format. This works best with tables containing mostly numeric columns because they’re more compactly represented in binary fields than character data. If all your data is alphanumeric, you won’t gain much by exporting it in native format. Not allowing nulls in the numeric fields can further compact the data. If you allow a field to be nullable, the field’s binary representation will contain a 1-byte prefix indicating how many bytes of data will follow. You can’t use BCP for more than 2,147,483,647 records because the BCP counter variable is a 4-byte integer. I wasn’t able to find any reference to this on MSDN or the Internet. If your table consists of more than 2,147,483,647 records, you’ll have to export it in chunks or write your own export routine. Defining a clustered index on a prepopulated table takes a lot of disk space. In my test, my log exploded to 10 times the original table size before completion. When importing a large number of records using the BULK INSERT statement, include the BATCHSIZE parameter and specify how many records to commit at a time. If you don’t include this parameter, your entire file is imported as a single transaction, which requires a lot of log space. The fastest way of getting data into a table with a clustered index is to presort the data first. You can then import it using the BULK INSERT statement with the ORDER parameter.

Even that is small compared to the multi-petabyte Nasdaq OMX database, which houses tens of petabytes (thousands of terabytes) and trillions of rows on SQL Server.

0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...