Counting word occurrences in a table column

柔情痞子 提交于 2019-11-27 07:52:08

问题


I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.

If there are 2 rows with these fields:

"I like to eat bananas"
"I don't like to eat like a monkey"

I want to get

    word | count()
    ---------------
    like  3
    eat   2
    to    2
    i     2
    a     1

Any idea? I am using MySQL 5.2.


回答1:


@Elad Meidar, I like your question and I found a solution:

SELECT SUM(total_count) as total, value
FROM (

SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
   ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value

Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE




回答2:


I would recommend not to do this in SQL at all. You're loading DB with something that it isn't best at. Selecting a group of rows and doing frequency calculation on the application side will be easier to implement, will work faster and will be maintained with less issues/headaches.




回答3:


I think you should do it like indexing, with additional table. Whenever u create, update, or delete a row in your original table, you should update your indexing table. That indexing table should have the columns: word, and the number of occurrences.




回答4:


You can try this perverted-a-little way:

SELECT 
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC

This query can be very slow. Also, it looks pretty ugly.




回答5:


I think you are trying to do too much with SQL if all the words are in one field of each row. I recommend to do any text processing/counting with your application after you grab the text fields from the db.



来源:https://stackoverflow.com/questions/1094888/counting-word-occurrences-in-a-table-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!