Does sqlite3 compress data?

后端 未结 3 1876
一生所求
一生所求 2020-12-16 10:32

I\'ve got an 7.4Gb csv file. After converting it to a sqlite database with a python script the output DB is 4.7Gb, around 60% of the original size.

The csv has aroun

相关标签:
3条回答
  • 2020-12-16 10:42

    SQLite, by default, does not compress data it writes to the disk; however, SQLite does have a set of "Proprietary Extensions" for that and other purposes. Look for ZIPVFS in the links as follows.

    http://www.sqlite.org/support.html and http://www.hwaci.com/sw/sqlite/prosupport.html

    You can achieve a lot of "compression" in your data by encoding fields as integers. For example an IP address was engineered to fit into a word (4 bytes). Each octect of the address may be represented in one byte of a word.

    string[] octets = '172.168.0.1'.split('.')
    int ip = atoi(octets[0]) << 24
    ip |= atoi(octets[1]) << 16
    ip |= atoi(octets[2]) << 8
    ip |= atoi(octets[3])
    

    Additionally, your timestamp may be represented in Unix time, which is the number of seconds since the epoch.

    UPDATE mytable SET healthtime = CAST(strftime('%s',healthtime) AS INTEGER);
    

    See the Date and Time functions

    Note the CAST directive in the above SQL: SQLite does not enforce type on a column, so you may have a group of digits stored as a string; increasing your field size more than necessary (this will also make certain queries behave oddly).

    One more thing: field size is not the only part of the story. Remember that indexes take up a space too, and indexes on integers are more efficient --in terms of disk size and performance.

    0 讨论(0)
  • 2020-12-16 10:47

    SQLite is not running a compression algorithm, but it will store data in a binary file instead of a text file. Which means that the data can be stored more efficiently, for example using a 32-bit (4 byte) number to represent 10,000,000 instead of storing it as 8 bytes of text (or more if the file is unicode).

    Here are more details on the SQL Database File Format if you are interested.

    Does that make sense?

    0 讨论(0)
  • 2020-12-16 10:51

    I have a rather big SQLite file in my GIT repository and I was wondering why my total repository size was not growing that much but instead, was even smaller than my SQLite-.db-file. Turns out, GIT compresses the repository by default. A quick check on my .db-file also indicates this, since zipping the .db-file resulted in a zip-archive which was only ~20% the size of the .db-file.

    So, at least with the default settings, it doesn't look like SQLite is storing the data in a compressed way. However, regardless of that, it might be fine to add a big SQLite file to a GIT repository, since GIT performs compression automatically.

    0 讨论(0)
提交回复
热议问题