How can I select the count of distinct values from data that is stored as comma separated values in MySql? I\'ll be using PHP to output the data from MySql in the end.
I don't really know how to transform an horizontal list of comma-separated values to a list of rows without creating a table containing numbers, as many numbers as you may have comma-separated values. If you can create this table, here is my answer:
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) AS one_tag,
COUNT(*) AS cnt
FROM (
SELECT
GROUP_CONCAT(tags separator ',') AS all_tags,
LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
FROM test
) t
JOIN numbers n
ON n.num <= t.count_tags
GROUP BY one_tag
ORDER BY cnt DESC;
Returns:
+---------------------+-----+
| one_tag | cnt |
+---------------------+-----+
| chicken | 5 |
| pork | 4 |
| spaghetti | 3 |
| fried-rice | 2 |
| manchurain | 2 |
| pho | 1 |
| chicken-calzone | 1 |
| fettuccine | 1 |
| chorizo | 1 |
| meat-balls | 1 |
| miso-soup | 1 |
| chanko-nabe | 1 |
| chicken-manchurian | 1 |
| pork-manchurian | 1 |
| sweet-and-sour-pork | 1 |
| peking-duck | 1 |
| duck | 1 |
+---------------------+-----+
17 rows in set (0.01 sec)
Let's build your schema:
CREATE TABLE test (
id INT PRIMARY KEY,
tags VARCHAR(255)
);
INSERT INTO test VALUES
("1", "pho,pork"),
("2", "fried-rice,chicken"),
("3", "fried-rice,pork"),
("4", "chicken-calzone,chicken"),
("5", "fettuccine,chicken"),
("6", "spaghetti,chicken"),
("7", "spaghetti,chorizo"),
("8", "spaghetti,meat-balls"),
("9", "miso-soup"),
("10", "chanko-nabe"),
("11", "chicken-manchurian,chicken,manchurain"),
("12", "pork-manchurian,pork,manchurain"),
("13", "sweet-and-sour-pork,pork"),
("14", "peking-duck,duck");
We will work with all tags in a single line, so we use GROUP_CONCAT to do the job:
SELECT GROUP_CONCAT(tags SEPARATOR ',') FROM test;
Returns all tags separated by a comma:
pho,pork,fried-rice,chicken,fried-rice,pork,chicken-calzone,chicken,fettuccine,chicken,spaghetti,chicken,spaghetti,chorizo,spaghetti,meat-balls,miso-soup,chanko-nabe,chicken-manchurian,chicken,manchurain,pork-manchurian,pork,manchurain,sweet-and-sour-pork,pork,peking-duck,duck
To count all tags, we get the length of the full list of tags, and we remove the length of the full list of tags after replacing the , by nothing. We add 1, as the separator is between two values.
SELECT LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
FROM test;
Returns:
+------------+
| count_tags |
+------------+
| 28 |
+------------+
1 row in set (0.00 sec)
We use the SUBSTRING_INDEX function to get
-- returns the string until the 2nd delimiter\'s occurrence from left to right: a,b
SELECT SUBSTRING_INDEX('a,b,c', ',', 2);
-- return the string until the 1st delimiter, from right to left: c
SELECT SUBSTRING_INDEX('a,b,c', ',', -1);
-- we need both to get: b (with 2 being the tag number)
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('a,b,c', ',', 2), ',', -1);
With such logic, to get the 3rd tag in our list, we use:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1)
FROM test;
Returns:
+-------------------------------------------------------------------------------------+
| SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1) |
+-------------------------------------------------------------------------------------+
| fried-rice |
+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
My idea is a little tricky:
So we will create a table containing all numbers from 1 to the maximum number of tags you may have in your list. If you can have 1M values, create 1M entries from 1 to 1,000,000. For 100 tags, this will be:
CREATE TABLE numbers (
num INT PRIMARY KEY
);
INSERT INTO numbers VALUES
( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 ), ( 10 ),
( 11 ), ( 12 ), ( 13 ), ( 14 ), ( 15 ), ( 16 ), ( 17 ), ( 18 ), ( 19 ), ( 20 ),
( 21 ), ( 22 ), ( 23 ), ( 24 ), ( 25 ), ( 26 ), ( 27 ), ( 28 ), ( 29 ), ( 30 ),
( 31 ), ( 32 ), ( 33 ), ( 34 ), ( 35 ), ( 36 ), ( 37 ), ( 38 ), ( 39 ), ( 40 ),
( 41 ), ( 42 ), ( 43 ), ( 44 ), ( 45 ), ( 46 ), ( 47 ), ( 48 ), ( 49 ), ( 50 ),
( 51 ), ( 52 ), ( 53 ), ( 54 ), ( 55 ), ( 56 ), ( 57 ), ( 58 ), ( 59 ), ( 60 ),
( 61 ), ( 62 ), ( 63 ), ( 64 ), ( 65 ), ( 66 ), ( 67 ), ( 68 ), ( 69 ), ( 70 ),
( 71 ), ( 72 ), ( 73 ), ( 74 ), ( 75 ), ( 76 ), ( 77 ), ( 78 ), ( 79 ), ( 80 ),
( 81 ), ( 82 ), ( 83 ), ( 84 ), ( 85 ), ( 86 ), ( 87 ), ( 88 ), ( 89 ), ( 90 ),
( 91 ), ( 92 ), ( 93 ), ( 94 ), ( 95 ), ( 96 ), ( 97 ), ( 98 ), ( 99 ), ( 100 );
Now, we get the numth (num being a row in number) using the following query:
SELECT n.num, SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) as one_tag
FROM (
SELECT
GROUP_CONCAT(tags SEPARATOR ',') AS all_tags,
LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
FROM test
) t
JOIN numbers n
ON n.num <= t.count_tags
Returns:
+-----+---------------------+
| num | one_tag |
+-----+---------------------+
| 1 | pho |
| 2 | pork |
| 3 | fried-rice |
| 4 | chicken |
| 5 | fried-rice |
| 6 | pork |
| 7 | chicken-calzone |
| 8 | chicken |
| 9 | fettuccine |
| 10 | chicken |
| 11 | spaghetti |
| 12 | chicken |
| 13 | spaghetti |
| 14 | chorizo |
| 15 | spaghetti |
| 16 | meat-balls |
| 17 | miso-soup |
| 18 | chanko-nabe |
| 19 | chicken-manchurian |
| 20 | chicken |
| 21 | manchurain |
| 22 | pork-manchurian |
| 23 | pork |
| 24 | manchurain |
| 25 | sweet-and-sour-pork |
| 26 | pork |
| 27 | peking-duck |
| 28 | duck |
+-----+---------------------+
28 rows in set (0.01 sec)
As soon as we now have classic rows, we can easily count occurrences of each tags.
See the top of this answer to see the request.