Suppose we have a songs
table, and each song
can have any number of labels (0, 1, 2, 3 or more).
2 approaches to storing this info in a database are:
2 tables: a
songs
table, and acategories
table, where each row in the categories table would havesong_id
, andcategory
(where category is "Rock", "Country", "Metal" etc etc). If a song belongs to multiple categories, there would be multiple rows with that song_id in the categories table.3 tables: a
songs
table, asongscategories
table, and acategories
table. The songscategories table would have just two columns:song_id
andcategory_id
, and the categories table would also have just two columnscategory_id
andcategory_name
The goal is to avoid future problems that could arise from failing to think carefully about the best schema now.
What I know so far:
- The first approach uses fewer tables and is therefore simpler
- The first approach could require more storage, since the category names need to be remembered many times (rather than just once as with the second approach). If more info is stored for each category, then they will have to be extra columns in the
categories
table, meaning even more duplicated info. - The second approach requires more joins to retrieve a song and its category (2 joins rather than 1), so it could be slower
So the question is should we optimise for fewer tables and joins, or for consuming less storage space? What do other applications do in this situation, and are there considerations I haven't noted above?
question from:https://stackoverflow.com/questions/65641491/best-practice-database-schema-for-multi-label-on-a-resource