I\'ve got an 7.4Gb csv file. After converting it to a sqlite database with a python script the output DB is 4.7Gb, around 60% of the original size.
The csv has aroun
SQLite, by default, does not compress data it writes to the disk; however, SQLite does have a set of "Proprietary Extensions" for that and other purposes. Look for ZIPVFS
in the links as follows.
http://www.sqlite.org/support.html and http://www.hwaci.com/sw/sqlite/prosupport.html
You can achieve a lot of "compression" in your data by encoding fields as integers. For example an IP address was engineered to fit into a word (4 bytes). Each octect of the address may be represented in one byte of a word.
string[] octets = '172.168.0.1'.split('.')
int ip = atoi(octets[0]) << 24
ip |= atoi(octets[1]) << 16
ip |= atoi(octets[2]) << 8
ip |= atoi(octets[3])
Additionally, your timestamp may be represented in Unix time, which is the number of seconds since the epoch.
UPDATE mytable SET healthtime = CAST(strftime('%s',healthtime) AS INTEGER);
See the Date and Time functions
Note the CAST
directive in the above SQL: SQLite does not enforce type on a column, so you may have a group of digits stored as a string; increasing your field size more than necessary (this will also make certain queries behave oddly).
One more thing: field size is not the only part of the story. Remember that indexes take up a space too, and indexes on integers are more efficient --in terms of disk size and performance.
SQLite is not running a compression algorithm, but it will store data in a binary file instead of a text file. Which means that the data can be stored more efficiently, for example using a 32-bit (4 byte) number to represent 10,000,000
instead of storing it as 8 bytes of text (or more if the file is unicode).
Here are more details on the SQL Database File Format if you are interested.
Does that make sense?
I have a rather big SQLite file in my GIT repository and I was wondering why my total repository size was not growing that much but instead, was even smaller than my SQLite-.db
-file. Turns out, GIT compresses the repository by default. A quick check on my .db
-file also indicates this, since zipping the .db
-file resulted in a zip-archive which was only ~20% the size of the .db
-file.
So, at least with the default settings, it doesn't look like SQLite is storing the data in a compressed way. However, regardless of that, it might be fine to add a big SQLite file to a GIT repository, since GIT performs compression automatically.