How to query text to find the longest prefix strings in SQL?

不羁岁月 提交于 2020-01-24 19:56:29

问题


I am using sparq sql. Let's say this is a snapshot of my big table:

ups store
ups store austin
ups store chicago
ups store bern
walmart
target

How can I find the longest prefix for the above data in sql? That is:

 ups store
 walmart
 target

I already have a Java program to do this but I have a large file, now my question is if this could be reasonably done in SQL?

How about the following more complicated scnenario? (I can live without this but nice to have it if possible)

ups store austin
ups store chicago
ups store bern
walmart
target

and that would return [ups store, walmart, target].


回答1:


Assuming you're free to create another table that simply has a list of ascending integers from zero up to the size of the longest possible string then the following should do the job using only ANSI SQL:

SELECT
  id,
  SUBSTRING(name, 1, CASE WHEN number = 0 THEN LENGTH(name) ELSE number END) AS prefix
FROM
 -- Join all places to all possible substring lengths.
 (SELECT *
  FROM places p
  CROSS JOIN lengths l) subq
-- If number is zero then no prefix match was found elsewhere
-- (from the question it looked like you wanted to include these)
WHERE (subq.number = 0 OR
       -- Look for prefix match elsewhere
       EXISTS (SELECT * FROM places p
               WHERE SUBSTRING(p.name FROM 1 FOR subq.number)
                     = SUBSTRING(subq.name FROM 1 FOR subq.number)
                 AND p.id <> subq.id))
  -- Include as a prefix match if the whole string is being used
  AND (subq.number = LENGTH(name)
       -- Don't include trailing spaces in a prefix
       OR (SUBSTRING(subq.name, subq.number, 1) <> ' '
           -- Only include the longest prefix match 
           AND NOT EXISTS (SELECT * FROM places p 
                           WHERE SUBSTRING(p.name FROM 1 FOR subq.number + 1)
                                 = SUBSTRING(subq.name FROM 1 FOR subq.number + 1)
                             AND p.id <> subq.id)))
ORDER BY id;

Live demo: http://rextester.com/XPNRP24390

The second aspect is that what if we have (ups store austin, ups store chicago). can we use SQL to extract the 'ups store' off of it.

This should be simply a case of using SUBSTRING in a similar way to above, e.g:

SELECT SUBSTRING(name,
                 LENGTH('ups store ') + 1,
                 LENGTH(name) - LENGTH('ups store '))
FROM places
WHERE SUBSTRING(name,
                1,
                LENGTH('ups store ')) = 'ups store ';



回答2:


Supposing your column name is "mycolumn", and your big table is "mytable", and a single space is your field separator:

In PostgreSQL, you could do something as simple as this:

select
   mycolumn
from
   mytable
order by
   length(split_part(mycolumn, ' ', 1)) desc
limit
   1

If you ran this query often, I'd probably try an ordered functional index on the table like this:

create prefix_index on mytable (length(split_part(mycolumn, ' ', 1)) desc)


来源:https://stackoverflow.com/questions/42079042/how-to-query-text-to-find-the-longest-prefix-strings-in-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!