Should I use hstore in renewed data model?

匆匆过客 提交于 2019-12-07 23:40:04

问题


In my legacy database (Postgres 9.1) I have several tables containing divers kinds of documents (let's say they are 'parent' tables). Additionally, there is a table with various parameters for these documents:

create table params (
    kind integer,
    docid integer,
    parname text,
    parvalue text,
    constraint params_pk primary key (kind, docid, parname));

There may be many (parname, parvalue) pairs for one document. As kind points to different tables it cannot be used as a foreign key.

It has been working well for years as params were used only for printing documents. Now this table contains 5 millions rows and the data is needed also for other purposes. So it is high time to renew this model.

Basically params are inserted once for a document and very seldom updated. They will be read as a whole (for a document). There is no need to search for a specific parname.

I have three ideas:

Variant A. Split table params into several tables according to parent tables and use docid as foreign key.

Variant B. Split table params as in variant A and store (parname, parvalue) as hstore.

Variant C. Add a hstore field in every parent table and forget additional tables.

I have no experience with hstore. What are the cons and pros of each variant? Which one would you choose? Can hstore baffle me with something strange?


回答1:


I vote for the third option. The fewer tables the better sleep.

Hstore was invented for one-level parameter lists. It is stable, fast and simple, and perfectly fits your needs. I had similar task some time ago. I wrote an aggregate for easier conversion.

create or replace function hstore_add(hstore, text, text)
returns hstore language plpgsql 
as $$
begin
    return case
        when $1 isnull then hstore($2, $3)
        else $1 || hstore($2, $3) end;
end $$;

create aggregate hstore_agg (text, text) (
    sfunc = hstore_add,
    stype = hstore
);

I think it may save your time.

select kind, docid, hstore_agg(parname, parvalue)
from params
group by 1, 2
order by 1, 2;



回答2:


If as you say you need to fetch the fields with the document then Denormalized hstore variant is better because the server will be able to fetch the entire document from a single location on disk instead of using several locations to index-join the document with the fields. The only problem I see with hstore is a somewhat unconventional syntax. Might be easier to work with JSON. PostgreSQL 9.4 will have an excellent support for (indexed) binary JSON. Using binary JSON is recommended by hstore authors, BTW.

So a plan might be to use a json column in 9.3 and then convert it to jsonb in 9.4.



来源:https://stackoverflow.com/questions/23614873/should-i-use-hstore-in-renewed-data-model

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!