问题
I have a table with 7 columns and 5 of them will be null. I will have a null columns on int, text, date, boolean, and money data types. This table will contain millions of rows with many many nulls. I am afraid that the null values will occupy space.
Also, do you know if Postgres indexes null values? I would like to prevent it from indexing nulls.
回答1:
Basically, NULL values occupy 1 bit in the NULL bitmap. But it is not as simple as that.
The null bitmap (per row) is only there if at least one column in that row holds a NULL value. This can lead to a paradox effect in tables with 9 or more columns: assigning the first NULL value to a column can take up more space on disk than writing a value to it. Conversely, with the last column becoming non-null, the null bitmap is dropped for the row.
Physically, the initial null bitmap occupies 1 byte between the HeapTupleHeader (23 bytes) and actual column data or the row OID (if you should still be using that) - which always start at a multiple of MAXALIGN (typically 8 bytes). This leaves 1 byte of padding that is utilized by the initial null bitmap.
In effect NULL storage is absolutely free for tables of 8 columns or less.
After that, another MAXALIGN bytes (typically 8) are allocated for the next MAXALIGN * 8 columns (typically 64). Etc.
More details in the manual and under these related questions:
- How much disk-space is needed to store a NULL value using postgresql DB?
- Does not using NULL in PostgreSQL still use a NULL bitmap in the header?
- How many records can I store in 5 MB of PostgreSQL on Heroku?
Once you understand alignment padding of data types, you can further optimize storage:
- Calculating and saving space in PostgreSQL
But the cases are rare where you can save substantial amounts of space. Normally it's not worth the effort.
@Daniel already covers effects on index size.
回答2:
Whether NULL values get to the index or not depend at least on the type of the index.
Basically, this would be YES for btree and gist index types, NO for hash, and it seems YES or NO for gin index types depending on PostgreSQL version.
There used to be a boolean column amindexnulls in the pg_catalog.pg_am table that carried that information, but it's gone in 9.1. Probably because indexes have got even more sophisticated among PG improvements.
In the specific case of your data, the best way to know would be to measure the size difference of indexes, using the pg_relation_size('index_name') function, between contents entirely NULL and entirely NOT NULL, with your exact PG version, exact datatype, exact index type and definition. And know that possibly, a future change in any of these parameters may change the outcome.
But in any case, if you "just" want to avoid indexing NULLs, it's always possible to create a partial index:
CREATE INDEX partial_idx(col) ON table WHERE (col is not null)
This is going to take less space, but whether this is going to help or not with the performance of queries depend on these queries.
回答3:
I believe each would use a single bit in the bitmap for the row. See here: http://www.postgresql.org/docs/9.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE
来源:https://stackoverflow.com/questions/12145772/do-nullable-columns-occupy-additional-space-in-postgresql