Sql server function for displaying word frequency in a column

爷,独闯天下 提交于 2019-11-28 00:24:19

Okay this works a treat. Firstly a function to separate the values...

Alter Function dbo.SeparateValues    

(    
 @data VARCHAR(MAX),    
 @delimiter VARCHAR(10)     
)     
RETURNS     
@tbldata TABLE(col VARCHAR(MAX))    
As    
--Declare @data VARCHAR(MAX) ,@delimiter VARCHAR(10)     
--Declare @tbldata TABLE(col VARCHAR(10))    
--Set @data = 'hello,how,are,you?,234234'    
--Set @delimiter = ','    
--DECLARE @tbl TABLE(col VARCHAR(10))    
Begin    
DECLARE @pos INT    
DECLARE @prevpos INT    
SET @pos = 1     
SET @prevpos = 0    

WHILE @pos > 0     
BEGIN    
SET @pos = CHARINDEX(@delimiter, @data, @prevpos+1)    
if @pos > 0     
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, @pos-@prevpos-1))))    
else    
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, len(@data)-@prevpos))))    
SET @prevpos = @pos     
End    

RETURN       
END    

then I just apply it to my table...

Select Count(*), sep.Col FROM (
        Select * FROM (
            Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(response, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), '  ', ' '), '  ', ' ')))) FROM Responses
        ) easyValues
        Where value <> '' 
    ) actualValues 
    Cross Apply dbo.SeparateValues(value, ' ') sep
    Group By sep.Col
    Order By Count(*) Desc

Okay, so I went OTT with my nested tables, but I've stripped out all the crap characters, separated the values and kept a running total of the most frequently used words.

You're main problem is that you're missing a split function in SQL Server.

Theres a sample one here that looks pretty good..

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

Using that, you write a stored proc along the lines of...

CREATE TABLE #Temp (Response nvarchar(50), Frequency int) 

DECLARE @response nvarchar(100)
DECLARE db_cursor CURSOR FOR 
SELECT response FROM YourTable

OPEN db_cursor  
FETCH NEXT FROM db_cursor INTO @response 

WHILE @@FETCH_STATUS = 0  
BEGIN  
       /* Pseudo Code */ 
       --Split @Response 
       --Iterate through each word in returned list
       --IF(EXISTS in #TEMP)
       --    UPDATE THAT ROW & INCREMENT THE FREQUENCY
       --ELSE
       --    NEW WORD, INSERT TO #Temp WITH A FREQUENCY OF 1

       FETCH NEXT FROM db_cursor INTO @response 
END   

SELECT * FROM #Temp

Theres probably a less fugly way to do this without cursors, but if it's just something you need to run once, and you're table or Responses isn't phenomenally large, then this should work

DECLARE @phrases TABLE (id int, phrase varchar(max))
INSERT @phrases values
(1,'Red and White'  ),
(2,'green'          ),
(3,'White and blue' ),
(4,'Blue'           ),
(5,'Dark blue'      );

SELECT word, COUNT(*) c
FROM @phrases
CROSS APPLY (SELECT CAST('<a>'+REPLACE(phrase,' ','</a><a>')+'</a>' AS xml) xml1 ) t1
CROSS APPLY (SELECT n.value('.','varchar(max)') AS word FROM xml1.nodes('a') x(n) ) t2
GROUP BY word
word         freq
----------- -----------
and         2
blue        3
Dark        1
green       1
Red         1
White       2
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!