Column-family concept and data model

前端 未结 3 1214
迷失自我
迷失自我 2020-12-30 23:27

I\'m investigating the different types of NoSQL database types and I\'m trying to wrap my head around the data model of column-family stores, such as Bigtable, HBase and Cas

3条回答
  •  忘掉有多难
    2020-12-30 23:40

    Both models you've described are the same.

    Column family is:

    Key -> Key -> (Set of key/value pairs)
    

    Conceptually it becomes:

    Table -> Row -> (Column1/Value1, Column2/Value2, ...)
    

    Think of it as a Map of Map of Key/Value pairs.

    UserProfile = {
        Cassandra = [emailAddress:"cassandra@apache.org", age:20],
        TerryCho = [emailAddress:"terry.cho@apache.org", gender:"male"],
        Cath = [emailAddress:"cath@apache.org", age:20, gender:"female", address:"Seoul"],
    }
    

    The above is an example of a column family. If you were to tabulate it, you'd get a Table called UserProfile which looks like:

    UserName | Email | Age | Gender | Address
    Cassandra | cassandra@apache.org | 20 | null | null
    TerryCho | terry.cho@apache.org | null | male | null
    Cath | cath@apache.org | 20 | female | Seoul
    

    The confusing part is that there's not really a column or a row as we're used to thinking of them. There's a bunch of "column families" which are queried by name (the key). Those families contain a bunch of sets of key/value pairs, which are also queried by name (the row key), and finally, each value in the set can be looked up by name also (the column key).

    If you needed a tabular reference point, "column families" would be your "tables". Each "set of k/v pair" inside them would be your "rows". Each "pair of the set" would be the "column names and their values".

    Internally, the data inside each column familly is going to be stored together, and it'll be stored such that the rows are one after the other, and in each row, the columns are one after the other. So you get row1 -> col1/val1, col2/val2, ... , row2 -> col1/val1 ... , ... -> .... So in that sense, the data is stored much more like a row-store, and less so like a column-store.

    To finish, the choice of words here is just unfortunate and misleading. Columns in Column Families should have been called Attributes. Rows should have been called Attribute Sets. Column families should have been called Attributes families. The relation to classic tabular vocabulary is weak and misleading, since it's actually pretty different.

提交回复
热议问题