The task

I have a locations table which stores the name of a location. Then I have a tags table which stores information about those locations. The locations have a hierarchie which I want to use to get all tags.

Example

Locations:

USA <- California <- San Francisco <- Mission St

Tags:

USA: English
California: Sunny
California: West coast
San Francisco: Sea side
Mission St: Cable car station

If somebody requests information about the Mission St I want to deliver all tags of it and it's ancestors (["English", "Sunny", "West coast", "Sea side", "Cable car station"]. If I request all tags of California the answer would be ["English", "Sunny", "West coast"].

I'm looking for the best read performance! I don't care about write performance. This data is not changed very often. And I don't care about table sizes either. If I need more or larger tables to solve this quicker so be it.

The tables

So currently I'm thinking about setting up these tables:

locations

id | name
---|--------------
1  | USA
2  | California
3  | San Francisco
4  | Mission St

tags

id | location_id | name
---|-------------|------------------
1  | 1           | English
2  | 2           | Sunny
3  | 2           | West coast
4  | 3           | Sea side
5  | 4           | Cable car station

ancestors

I added a position field to store the hierarchy.

| id | location_id | ancestor_id | position |
|----|-------------|-------------|----------|
| 1  | 2           | 1           | 1        |
| 2  | 3           | 2           | 1        |
| 3  | 3           | 1           | 2        |
| 4  | 4           | 3           | 1        |
| 5  | 4           | 2           | 2        |
| 6  | 4           | 1           | 3        |

Question

Is this a good solution to solve the problem or is there a better one? I want to select as fast as possible all tags of any given location including all the tags of it's ancestors. I'm using a PostgreSQL database but I think this is a pure SQL architecture problem.

回答1:

Your problem seems to consist of two challenges. The most interesting is "how do I store hierarchies in a relational database". There are lots of answers to that - the one you've proposed is the most common.

There's an alternative called "nested set" which is faster for reading (in your example, finding all locations within a particular hierarchy would be "between x and y".

Postgres has dedicated support for hierachies; I'd assume this would also provide great performance.

The second part of your question is "given a path in my hierarchy, retrieve all matching tags". The easiest option is to join to the tags table as you suggest.

The final aspect is "should you denormalize/precalculate". I usually recommend building and optimizing the "normalized" solution and only denormalize when you need to.

回答2:

If you want to deliver all tags for a particular location, then I would recommend replicating the data and storing the tags in a tags array on a row for each location.

You say that the locations don't change very much. So, I would simply batch create the entire table, when any underlying data changes.

Modifying the data in situ is rather problematic. A single update could end up affecting a zillion different rows -- consider a tag change on USA. Recalculating the entire table is going to be more efficient.

If you need to search on the tags as well as return them, then I would go for a more traditional structure of a table with two important columns, location and tag. Then you can have indexes on both (location) and (tag) to facilitate searching in either direction.

回答3:

If write performance is not crucial, I would go for denormalization of the database. That means you use the above structure for your write operations and fill a table for your read operations by a trigger or a some async job, if you are afraid of triggers. Then the read performance is optimal, but you have to invest a bit more into the write logic.

Using the above structure for read operations is indeed not a smart solution, cause you don't know how deep the tree can get.

来源：https://stackoverflow.com/questions/59891553/best-data-structure-for-finding-tags-of-nested-locations

标签

sql

postgresql

database-design

Best data structure for finding tags of nested locations

问题