How do I SQL query for words with punctuation in Postgresql?

情到浓时终转凉″ 提交于 2021-02-18 22:45:23

问题


If I have strings/phrases like this stored in the database:

  • What are Q-type Operations?
  • Programmer's Guide
  • A.B.C's of Coding

Is there a way to pass a query parameter in like "Programmers" or "abc" or "q-type" and have it find "Programmer's", "A.B.C" and "Q-type"?


回答1:


tsvector

Use the tsvector type, which is part of the PostgreSQL text-search feature.

postgres> select 'What are Q-type Operations?'::tsvector;
              tsvector               
-------------------------------------
 'Operations?' 'Q-type' 'What' 'are'
(1 row)

You can use familiar operators on tsvectors as well:

postgres> select 'What are Q-type Operations?'::tsvector
postgres>        || 'A.B.C''s of Coding'::tsvector;
                           ?column?                           
--------------------------------------------------------------
 'A.B.C''s' 'Coding' 'Operations?' 'Q-type' 'What' 'are' 'of'

From tsvector documentation:

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details). Sorting and duplicate-elimination are done automatically during input

If you also want to do language-specific normalization, like removing common words ('the', 'a', etc) and multiplies, use the to_tsvector function. It also assigns weights to different words for text search:

postgres> select to_tsvector('english',
postgres> 'What are Q-type Operations? A.B.C''s of Coding');
                      to_tsvector                       
--------------------------------------------------------
 'a.b.c':7 'code':10 'oper':6 'q':4 'q-type':3 'type':5
(1 row)

Full-blown text search

Obviously doing this for every row in a query will be expensive -- so you should store the tsvector in a separate column and use ts_query() to search for it. This also allows you to create a GiST index on the tsvector.

postgres> insert into text (phrase, tsvec)
postgres>   values('What are Q-type Operations?',
postgres>   to_tsvector('english', 'What are Q-type Operations?'));
INSERT 0 1

Searching is done using tsquery and the @@ operator:

postgres> select phrase from text where tsvec @@ to_tsquery('q-type');
           phrase            
-----------------------------
 What are Q-type Operations?
(1 row)



回答2:


You could try with an ILIKE with a TRANSLATE function, see here.

For example: translate(field, '.-\'', '')




回答3:


Here's another link that can be relevant. Strip the value of the field from all punctuation before comparing it to the search string.

SQL Server: How do you remove punctuation from a field?




回答4:


Postgres supports pattern matching so you can build a regular expression in your where clause http://www.postgresql.org/docs/8.3/static/functions-matching.html




回答5:


Postgresql supports full-text searching by converting text input to tsvector types:

steve@steve@[local] =# select input, to_tsvector('english', input)\
   from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input);
            input            |            to_tsvector             
-----------------------------+------------------------------------
 What are Q-type Operations? | 'oper':6 'q':4 'q-type':3 'type':5
 Programmer's Guide          | 'guid':3 'programm':1
 A.B.C's of Coding           | 'a.b.c':1 'code':4
(3 rows)

As you can see, the stemming used by default will make "programming" "programmer" and "programmer's" all match identically.

You would typically use this by having an indexed tsvector column or expression, and then using the @@ operator to match that with a tsquery, e.g.:

steve@steve@[local] =# select input, to_tsvector('english', input) \
   from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input)\
   where to_tsvector('english', input) @@ plainto_tsquery('english', 'programmers');
       input        |      to_tsvector      
--------------------+-----------------------
 Programmer's Guide | 'guid':3 'programm':1
(1 row)

Here plainto_tsquery analyses a user input string, and produces a query where every non-stop word in the query has to be matched by a tsvector.




回答6:


This sounds like you want something along these lines:

http://www.postgresql.org/docs/9.0/static/fuzzystrmatch.html

I'm not 100% sure if that will cover what you want though.

EDIT I had to run this up locally to check (Using PostgreSQL 9.0 on Windows)

Here's what I found:

template1=> select soundex('Programmers'), soundex('Programmer''s');
 soundex | soundex
---------+---------
 P626    | P626
(1 row)


template1=> select soundex('abc'), soundex('A.B.C.');
 soundex | soundex
---------+---------
 A120    | A120
(1 row)


template1=> select soundex('Q-type'), soundex('q-type');
 soundex | soundex
---------+---------
 Q310    | Q310
(1 row)

So if you were to do soundex(colname) = soundex(<user param>) should get you what you need in the where clause.

You will need to install the fuzzystrmatch module:

psql -U <dbowner> -d <database> -f SHAREDIR/contrib/fuzzystrmatch.sql

Refer to the documentation on how to locate SHAREDIR

EDIT I just noticed what I overlooked, I think this combined with the ts_vector functionality may get you where you are aiming for.



来源:https://stackoverflow.com/questions/5354342/how-do-i-sql-query-for-words-with-punctuation-in-postgresql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!