问题
Somebody pointed out that my data structure architecture sucks.
The task
I have a locations
table which stores the name
of a location. Then I have a tags
table which stores information about those locations
. The locations
have a hierarchie which I want to use to get all tags
.
Example
Locations:
USA <- California <- San Francisco <- Mission St
Tags:
USA: English
California: Sunny
California: West coast
San Francisco: Sea side
Mission St: Cable car station
If somebody requests information about the Mission St
I want to deliver all tags
of it and it's ancestors (["English", "Sunny", "West coast", "Sea side", "Cable car station"]
. If I request all tags
of California
the answer would be ["English", "Sunny", "West coast"]
.
I'm looking for the best read performance! I don't care about write performance. This data is not changed very often. And I don't care about table sizes either. If I need more or larger tables to solve this quicker so be it.
The tables
So currently I'm thinking about setting up these tables:
locations
id | name
---|--------------
1 | USA
2 | California
3 | San Francisco
4 | Mission St
tags
id | location_id | name
---|-------------|------------------
1 | 1 | English
2 | 2 | Sunny
3 | 2 | West coast
4 | 3 | Sea side
5 | 4 | Cable car station
ancestors
I added a position
field to store the hierarchy.
| id | location_id | ancestor_id | position |
|----|-------------|-------------|----------|
| 1 | 2 | 1 | 1 |
| 2 | 3 | 2 | 1 |
| 3 | 3 | 1 | 2 |
| 4 | 4 | 3 | 1 |
| 5 | 4 | 2 | 2 |
| 6 | 4 | 1 | 3 |
Question
Is this a good solution to solve the problem or is there a better one? I want to select as fast as possible all tags of any given location including all the tags of it's ancestors. I'm using a PostgreSQL database but I think this is a pure SQL architecture problem.
回答1:
Your problem seems to consist of two challenges. The most interesting is "how do I store hierarchies in a relational database". There are lots of answers to that - the one you've proposed is the most common.
There's an alternative called "nested set" which is faster for reading (in your example, finding all locations within a particular hierarchy would be "between x and y".
Postgres has dedicated support for hierachies; I'd assume this would also provide great performance.
The second part of your question is "given a path in my hierarchy, retrieve all matching tags". The easiest option is to join to the tags table as you suggest.
The final aspect is "should you denormalize/precalculate". I usually recommend building and optimizing the "normalized" solution and only denormalize when you need to.
回答2:
If you want to deliver all tags for a particular location, then I would recommend replicating the data and storing the tags in a tags array on a row for each location.
You say that the locations don't change very much. So, I would simply batch create the entire table, when any underlying data changes.
Modifying the data in situ is rather problematic. A single update could end up affecting a zillion different rows -- consider a tag change on USA. Recalculating the entire table is going to be more efficient.
If you need to search on the tags as well as return them, then I would go for a more traditional structure of a table with two important columns, location
and tag
. Then you can have indexes on both (location)
and (tag)
to facilitate searching in either direction.
回答3:
If write performance is not crucial, I would go for denormalization of the database. That means you use the above structure for your write operations and fill a table for your read operations by a trigger or a some async job, if you are afraid of triggers. Then the read performance is optimal, but you have to invest a bit more into the write logic.
Using the above structure for read operations is indeed not a smart solution, cause you don't know how deep the tree can get.
来源:https://stackoverflow.com/questions/59891553/best-data-structure-for-finding-tags-of-nested-locations