I\'m investigating the different types of NoSQL database types and I\'m trying to wrap my head around the data model of column-family stores, such as Bigtable, HBase and Cas
Both models you've described are the same.
Column family is:
Key -> Key -> (Set of key/value pairs)
Conceptually it becomes:
Table -> Row -> (Column1/Value1, Column2/Value2, ...)
Think of it as a Map of Map of Key/Value pairs.
UserProfile = {
Cassandra = [emailAddress:"cassandra@apache.org", age:20],
TerryCho = [emailAddress:"terry.cho@apache.org", gender:"male"],
Cath = [emailAddress:"cath@apache.org", age:20, gender:"female", address:"Seoul"],
}
The above is an example of a column family. If you were to tabulate it, you'd get a Table called UserProfile which looks like:
UserName | Email | Age | Gender | Address
Cassandra | cassandra@apache.org | 20 | null | null
TerryCho | terry.cho@apache.org | null | male | null
Cath | cath@apache.org | 20 | female | Seoul
The confusing part is that there's not really a column or a row as we're used to thinking of them. There's a bunch of "column families" which are queried by name (the key). Those families contain a bunch of sets of key/value pairs, which are also queried by name (the row key), and finally, each value in the set can be looked up by name also (the column key).
If you needed a tabular reference point, "column families" would be your "tables". Each "set of k/v pair" inside them would be your "rows". Each "pair of the set" would be the "column names and their values".
Internally, the data inside each column familly is going to be stored together, and it'll be stored such that the rows are one after the other, and in each row, the columns are one after the other. So you get row1 -> col1/val1, col2/val2, ... , row2 -> col1/val1 ... , ... -> .... So in that sense, the data is stored much more like a row-store, and less so like a column-store.
To finish, the choice of words here is just unfortunate and misleading. Columns in Column Families should have been called Attributes. Rows should have been called Attribute Sets. Column families should have been called Attributes families. The relation to classic tabular vocabulary is weak and misleading, since it's actually pretty different.